AR89
AR89

Reputation: 3638

C pointer arithmetics

This is the code that I don't understand, it simply reverse a string.

#include <stdio.h>

void strrev(char *p)
{
  char *q = p;
  while(q && *q) ++q;
  for(--q; p < q; ++p, --q)
    *p = *p ^ *q,
    *q = *p ^ *q,
    *p = *p ^ *q;
}

int main(int argc, char **argv)
{
  do {
    printf("%s ",  argv[argc-1]); strrev(argv[argc-1]);
    printf("%s\n", argv[argc-1]);
  } while(--argc);

  return 0;
}

The only piece of code that I don't understand is this one: while(q && *q) ++q;, it is used to find the eos. Isn't it the same as while(*q) ++q;, since q is never going to be 0? How can the author of the code be sure that q or *q are going to be 0?

This code comes from this question: How do you reverse a string in place in C or C++?

Upvotes: 9

Views: 7194

Answers (3)

Wojtek Surowka
Wojtek Surowka

Reputation: 21013

The code

while(q && *q)

is a shorthand for

while(q != NULL && *q != '\0')

so you are testing if q (at the beginning equal to p) is not NULL. It means that the function called with NULL argument will not crash in this while loop. (But it will still crash in the second loop).

Upvotes: 5

haccks
haccks

Reputation: 106122

Isn't it the same as while(*q) ++q;, since q is never going to be 0?

while(q && *q) is used to ensure that q is not NULL and *q is not NUL. Using while(*q) is also legal if q does not points to NULL.

void string_rev(char *p)
{
   char *q = p;

   if(q == NULL)
       return;

   while(*q) ++q;
   for(--q; p < q; ++p, --q)
       *p = *p ^ *q,
       *q = *p ^ *q,
       *p = *p ^ *q;

}  

How can the author of the code be sure that q or *q are going to be 0?

while(q && *q), q could be a NULL pointer if p points to the NULL and loop terminates before entering the loop body. Otherwise through out the operation it can't be a NULL pointer. Now the loop termination depends on *q. Once q reaches the end of the string, *q becomes 0 and loop terminates.

Upvotes: 1

Eric Lippert
Eric Lippert

Reputation: 660513

David Heffernan's comment is correct. That code is appalling.

The point of the code you are asking about is to skip dereferencing q if it is null. Therefore the author of the code believes that q could be null. Under what circumstances can q be null? The most obvious is: if p is null.

So let's see what the code does when p is null.

void strrev(char *p) // Assumption: p is null
{
  char *q = p; // now q is null
  while(q && *q) ++q; // The loop is skipped, so q and p are both still null.
  for(--q; 

So the first thing we do is decrement q, which is null. Likely this will wrap around and we will get as a result q containing the largest possible pointer.

    p < q; 

Since null is smaller than everything except null, and q is no longer null, this is true. We enter the loop...

    ++p, --q)
    *p = *p ^ *q,

And promptly dereference null.

    *q = *p ^ *q,
    *p = *p ^ *q;
}

Incidentally, at Coverity we refer to this as a "forward null defect" -- that is, the pattern where a code path indicates that a value is expected to be potentially null, and then later on the same code path assumes that it is not null. It is extremely common.

OK, so this code is completely broken if given null as an argument. Are there other ways that it is broken? What happens if we give it an empty string?

void strrev(char *p) // Assumption: *p is 0
{
  char *q = p; // *q is 0
  while(q && *q) ++q; // The second condition is not met so the body is skipped.
  for(--q; // q now points *before valid memory*.
       p < q  // And we compare an invalid pointer to a valid one.

Do we have a guarantee in C that says that when you subtract one from a pointer to valid memory, and then compare that pointer to another pointer, that the comparison is sensible? Because this strikes me as incredibly dangerous. I do not know the C standard well enough to say whether this is undefined behaviour or not.

Moreover, this code uses the terrible "swap two chars with xor" trick. Why on earth would anyone do this? It generates larger, slower machine code and is harder to read, understand and maintain. If you want to swap two things, swap them.

It also uses the comma operator to put multiple statements in a single statement so as to avoid the horror of braces around the body of the for. What is the purpose of this oddity? The purpose of code isn't to show how many operators you know, it is first and foremost to communicate to the reader of the code.

The function also modifies its formal parameter, which makes it hard to debug.

Upvotes: 33

Related Questions