11. Don't try to squeeze as many operations as possible in one line
The fragment is taken from Godot Engine project. The error is detected by the following PVS-Studio diagnostic: V567 Undefined behavior. The 't' variable is modified while being used twice between sequence points.
static real_t out(real_t t, real_t b, real_t c, real_t d) { return c * ((t = t / d - 1) * t * t + 1) + b; }
Explanation
Sometimes, you can come across code fragments where the authors try to squeeze as much logic as possible into a small volume of code, by means of complex constructs. This practice hardly helps the compiler, but it does make the code harder to read and understand for other programmers (or even the authors themselves). Moreover, the risk of making mistakes in such code is much higher, too.
It is in such fragments, where programmers try to put lots of code in just a few lines, that errors related to undefined behavior are generally found. They usually have to do with writing in and reading from one and the same variable within one sequence point. For a better understanding of the issue, we need to discuss in more detail the notions of "undefined behavior" and "sequence point".
Undefined behavior is the property of some programming languages to issue a result that depends on the compiler implementation or switches of optimization. Some cases of undefined behavior (including the one being discussed here) are closely related to the notion of a "sequence point".
A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been revealed. In C/C++ programming languages there are following sequence points:
- sequence points for operators "&&", "||", ",". When not overloaded, these operators guarantee left-to-right execution order;
- sequence point for ternary operator "?:";
- sequence point at the end of each full expression (usually marked with ';');
- sequence point in place of the function call, but after evaluating the arguments;
- sequence point when returning from the function.
Note. The new C++ standard has discarded the notion of a "sequence point", but we'll be using the above given explanation to let those of you unfamiliar with the subject, grasp the general idea easier and faster. This explanation is simpler than the new one, and is sufficient for us to understand why one shouldn't squeeze lots of operations into one "pile".
In the example we have started with, there is none of the above mentioned sequence points, while the '=' operator, as well as the parentheses, can't be treated as such. Therefore, we cannot know which value of the t variable will be used when evaluating the return value.
In other words, this expression is one single sequence point, so it is unknown in what order the t variable will be accessed. For instance, the "t * t" subexpression may be evaluated before or after writing into the " t = t / d - 1" variable.
Correct code
static real_t out(real_t t, real_t b, real_t c, real_t d) { t = t / d - 1; return c * (t * t * t + 1) + b; }
Recommendation
It obviously wasn't a good idea to try to fit the whole expression in one line. Besides it being difficult to read, it also made it easier for an error to sneak in.
Having fixed the defect and split the expression into two parts, we have solved 2 issues at once - made the code more readable, and gotten rid of undefined behavior by adding a sequence point.
The code discussed above is not the only example, of course. Here's another:
*(mem+addr++) = (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4;
Just as in the previous case, the error in this code has been caused by unreasonably complicated code. The programmer's attempt to increment the addr variable within one expression has led to undefined behavior as it is unknown which value the addr variable will have in the right part of the expression - the original or the incremented one.
The best solution to this problem is the same as before - do not complicate matters without reason; arrange operations in several expressions instead of putting them all in one:
*(mem+addr) = (opcode >= BENCHOPCODES) ? 0x00 : ((addr >> 4)+1) << 4; addr++;
There is a simple yet useful conclusion to draw from all of this - do not try to fit a set of operations in as few lines if possible. It may be more preferable to split the code into several fragments, thus making it more comprehensible, and reducing the chance errors occuring.
Next time you're about to write complex constructs, pause for a while and think what using them will cost you, and if you are ready to pay that price.