1. Don't do the compiler's job

Consider the code fragment, taken from MySQL project. The code contains an error that PVS-Studio analyzer diagnoses in the following way: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695.

static int rr_cmp(uchar *a,uchar *b)
{
  if (a[0] != b[0])
    return (int) a[0] - (int) b[0];
  if (a[1] != b[1])
    return (int) a[1] - (int) b[1];
  if (a[2] != b[2])
    return (int) a[2] - (int) b[2];
  if (a[3] != b[3])
    return (int) a[3] - (int) b[3];
  if (a[4] != b[4])
    return (int) a[4] - (int) b[4];
  if (a[5] != b[5])
    return (int) a[1] - (int) b[5];     <<<<====
  if (a[6] != b[6])
    return (int) a[6] - (int) b[6];
  return (int) a[7] - (int) b[7];
}

Explanation

This is a classic error, related to copying fragments of code (Copy-Paste). Apparently, the programmer copied the block of code"if (a[1] != b[1]) return (int) a[1] - (int) b[1];". Then he started changing the indices and forgot to replace "1" with "5". This resulted in the comparison function occasionally returning an incorrect value; this issue is going to be difficult to notice. And it's really hard to detect since all the tests had not revealed it before we scanned MySQL with PVS-Studio.

Correct code

if (a[5] != b[5])
  return (int) a[5] - (int) b[5];

Recommendation

Although the code is neat and easy-to-read, it didn't prevent the developers from overlooking the error. You can't stay focused when reading code like this because all you see is just similar looking blocks, and it's hard to concentrate the whole time.

These similar blocks are most likely a result of the programmer's desire to optimize the code as much as possible. He "unrolled the loop" manually. I don't think it was a good idea in this case.

Firstly, I doubt that the programmer has really achieved anything with it. Modern compilers are pretty smart, and are very good at automatic loop unrolling if it can help improve program performance.

Secondly, the bug appeared in the code because of this attempt to optimize the code. If you write a simpler loop, there will be less chance of making a mistake.

I'd recommend rewriting this function in the following way:

static int rr_cmp(uchar *a,uchar *b)
{
  for (size_t i = 0; i < 7; ++i)
  {
    if (a[i] != b[i])
      return a[i] - b[i]; 
  }
  return a[7] - b[7];
}

Advantages:

  • The function is easier to read and comprehend.
  • You are much less likely to make a mistake writing it.

I am quite sure that this function will work no slower than its longer version.

So, my advice would be - write simple and understandable code. As a rule, simple code is usually correct code. Don't try to do the compiler's job - unroll loops, for example. The compiler will most definitely do it well without your help. Doing such fine manual optimization work would only make sense in some particularly critical code fragments, and only after the profiler has already estimated those fragments as problematic (slow).

results matching ""

    No results matching ""