23. Evaluate the string literal length automatically

The fragment is taken from the OpenSSL library. The error is detected by the following PVS-Studio diagnostic: V666 Consider inspecting the third argument of the function 'strncmp'. It is possible that the value does not correspond with the length of a string which was passed with the second argument.

if (!strncmp(vstart, "ASCII", 5))
  arg->format = ASN1_GEN_FORMAT_ASCII;
else if (!strncmp(vstart, "UTF8", 4))
  arg->format = ASN1_GEN_FORMAT_UTF8;
else if (!strncmp(vstart, "HEX", 3))
  arg->format = ASN1_GEN_FORMAT_HEX;
else if (!strncmp(vstart, "BITLIST", 3))
  arg->format = ASN1_GEN_FORMAT_BITLIST;
else
  ....

Explanation

It's very hard to stop using magic numbers. Also, it would be very unreasonable to get rid of such constants as 0, 1, -1, 10. It's rather difficult to come up with names for such constants, and often they will make reading of the code more complicated.

However, it's very useful to reduce the number of magic numbers. For example, it would be helpful to get rid of magic numbers which define the length of string literals.

Let's have a look at the code given earlier. The code was most likely written using the Copy-Paste method. A programmer copied the line:

else if (!strncmp(vstart, "HEX", 3))

After that "HEX" was replaced by "BITLIST", but the programmer forgot to change 3 to 7. As a result, the string is not compared with "BITLIST", only with "BIT". This error might not be a crucial one, but still it is an error.

It's really bad that the code was written using Copy-Paste. What's worse is that the string length was defined by a magic constant. From time to time we come across such errors, where the string length does not correspond with the indicated number of symbols because of a typo or carelessness of a programmer. So it's quite a typical error, and we have to do something about it. Let's look closely at the question of how to avoid such errors.

Correct code

First it may seem that it's enough to replace strncmp() call with strcmp(). Then the magic constant will disappear.

else if (!strcmp(vstart, "HEX"))

Too bad-we have changed the logic of the code work. The strncmp() function checks if the string starts with "HEX", and the function strcmp() checks if the strings are equal. There are different checks.

The easiest way to fix this is to change the constant:

else if (!strncmp(vstart, "BITLIST", 7))
  arg->format = ASN1_GEN_FORMAT_BITLIST;

This code is correct, but it is very bad because the magic 7 is still there. That's why I would recommend a different method.

Recommendation

Such an error can be prevented if we explicitly evaluate the string length in the code. The easiest option is to use the strlen() function.

else if (!strncmp(vstart, "BITLIST", strlen("BITLIST")))

In this case it will be much easier to detect a mismatch if you forget to fix one of the strings:

else if (!strncmp(vstart, "BITLIST", strlen("HEX")))

But the suggested variant has two disadvantages:

  1. There is no guarantee that the compiler will optimize the strlen() call and replace it with a constant.
  2. You have to duplicate the string literal. It does not look graceful, and can be the subject of a possible error.

The first issue can be dealt with by using special structures for literal length evaluation during the compilation phase. For instance, you can use a macro such as:

#define StrLiteralLen(arg) ((sizeof(arg) / sizeof(arg[0])) - 1)
....
else if (!strncmp(vstart, "BITLIST", StrLiteralLen("BITLIST")))

But this macros can be dangerous. The following code can appear during the refactoring process:

const char *StringA = "BITLIST"; 
if (!strncmp(vstart, StringA, StrLiteralLen(StringA)))

In this case StrLiteralLen macro will return some nonsense. Depending on the pointer size (4 or 8 byte) we will get the value 3 or 7. But we can protect ourselves from this unpleasant case in C++ language, by using a more complicated trick:

template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define StrLiteralLen(str) (sizeof(ArraySizeHelper(str)) - 1)

Now, if the argument of the StrLiteralLen macro is a simple pointer, we won't be able to compile the code.

Let's have a look at the second issue (duplicating of the string literal). I have no idea what to say to C programmers. You can write a special macro for it, but personally I don't like this variant. I am not a fan of macros. That's why I don't know what to suggest.

In C++ everything is fabulously awesome. Moreover, we solve the first problem in a really smart way. The template function will be of a great help to us. You can write it in different ways, but in general it will look like this:

template<typename T, size_t N>
int mystrncmp(const T *a, const T (&b)[N])
{
  return _tcsnccmp(a, b, N - 1);
}

Now the string literal is used only once. The string literal length is evaluated during the compilation phase. You cannot accidentally pass a simple pointer to the function and incorrectly evaluate the string length. Presto!

Summary: try to avoid magic numbers when working with strings. Use macros or template functions; the code will become not only safer, but more beautiful and shorter.

As an example, you can look at the declaration of a function strcpy_s ():

errno_t strcpy_s(
   char *strDestination,
   size_t numberOfElements,
   const char *strSource 
);
template <size_t size>
errno_t strcpy_s(
   char (&strDestination)[size],
   const char *strSource 
); // C++ only

The first variant is intended for the C language, or in the case of a buffer size not being known in advance. If we work with the buffer, created on the stack, then we can use the second variant in C++:

char str[BUF_SIZE];
strcpy_s(str, "foo");

There are no magic numbers, there is no evaluation of the buffer size at all. It's short and sweet.

results matching ""

    No results matching ""