Reputation: 2975

Explanation of C++ FAQ's unsafe macro?

According to the C++ FAQ, macros are evil:

[9.5] Why should I use inline functions instead of plain old #define macros?

Because #define macros are evil in 4 different ways: evil#1, evil#2, evil#3, and evil#4. Sometimes you should use them anyway, but they're still evil. Unlike #define macros, inline functions avoid infamous macro errors since inline functions always evaluate every argument exactly once. In other words, invoking an inline function is semantically just like invoking a regular function, only faster:
// A macro that returns the absolute value of i
#define unsafe(i)  \
        ( (i) >= 0 ? (i) : -(i) )

// An inline function that returns the absolute value of i
inline
int safe(int i)
{
  return i >= 0 ? i : -i;
}

int f();

void userCode(int x)
{
  int ans;

  ans = unsafe(x++);   // Error! x is incremented twice
  ans = unsafe(f());   // Danger! f() is called twice

  ans = safe(x++);     // Correct! x is incremented once
  ans = safe(f());     // Correct! f() is called once
}
Also unlike macros, argument types are checked, and necessary conversions are performed correctly.

Macros are bad for your health; don't use them unless you have to.

Can someone explain why is unsafe(x++) increments x twice? I am not able to figure out.

Upvotes: 32

Answers (4)

luser droog

Reputation: 19504

Running it through the preprocessor shows the problem. Using gcc -E (can also use cpp -P, where the -P option also suppresses generated # lines),

inline
int safe(int i)
{
  return i >= 0 ? i : -i;
}

int f();

void userCode(int x)
{
  int ans;

  //    increment 1      increment 2 (one of these)
  //        |             |     |
  //        V             V     V
  ans = ( (x++) >= 0 ? (x++) : -(x++) );
  ans = ( (f()) >= 0 ? (f()) : -(f()) );

  ans = safe(x++);
  ans = safe(f());
}

As artless noise notes, the function f() is also called twice by the unsafe macro. Perhaps it's pure (has no side-effects) so it's not wrong, per se. But still suboptimal.

So, since inline functions are generally safer than function-like macros because they work on the same semantic level with the other basic elements: variables and expressions; and for manifest constants, enums can often be more tidy; what are the good uses of macros?

Setting constants known only at compile-time. You can define a macro from the command-line when compiling. Instead of

#define X 12

in the source file, you can add

-DX=12

to the cc command. You can also #undef X from the command-line with -UX.

This allows things like conditional-compilation, eg.

#if X
   do this;
#else
   do that;
#endif
   while (loop);

to be controlled by a makefile, itself perhaps generated with a configure script.

X-Macros. The most compelling use for X-Macros, IMO, is associating enum identifiers with printable strings. While it make look funny at first, it reduces duplication and synchronization issues with these kinds of parallel definitions.

#define NAMES(_) _(Alice) _(Bob) _(Caravaggio) _(DuncanIdaho)
#define BARE(_) _ ,
#define STRG(_) #_ ,
enum { NAMES(BARE) };
char *names[] = { NAMES(STRG) };

Notice that you can pass a macro's name as an argument to another macro and then call the passed macro by using the argument as if it were itself a macro (because it is one). For more on X-Macros, see this question.

Upvotes: 69

Kaz

Reputation: 58617

unsafe(x) evaluates the expression x twice. Once to determine its truth value, and then a second time in one of the two branches of the ternary operator. The inline function safe receives an evaluated argument: the expression is evaluated once prior to the function call, and the function call works with local variables.

unsafe is actually not quite as unsafe as it could be. The ternary operator introduces a sequence point between evaluating the test, and evaluating either the consequent or alternative expression. unsafe(x++) will reliably increments x twice, though, of course, the problem is that this behavior is unexpected. In general, macros which expand an expression more than once do not have this assurance. Usually, they produce outright undefined behavior!

Circa 1999 I produced a library module module for catching uses of macros with side effects.

So, you can write "evil" macros and use them, and the machine will catch situations where they are accidentally used with arguments that have side effects (provided you have adequate code coverage to hit those uses at run-time).

Here is the test program, unsafe.c. Note that it includes a header file sfx.h and uses a SFX_CHECK macro in the expansion token sequence of unsafe:

#include "sfx.h"

#define unsafe(i)  \
          ( (SFX_CHECK(i)) >= 0 ? (i) : -(i) )

inline
int safe(int i)
{
  return i >= 0 ? i : -i;
}

int f(void)
{
  return 0;
}

int main(void)
{
  int ans;
  int x = 0;

  ans = unsafe(x++);   // Error! x is incremented twice
  ans = unsafe(f());   // Danger! f() is called twice

  ans = safe(x++);     // Correct! x is incremented once
  ans = safe(f());     // Correct! f() is called once
}

We compile everything and run from a Linux shell prompt:

$ gcc unsafe.c hash.c except.c sfx.c -o unsafe
$ ./unsafe
unsafe.c:22: expression "x++" has side effects
unsafe.c:23: expression "f()" may have side effects

Note that x++ certainly has side effects, whereas f may or may not. So the messages are differently worded. A function being called twice is not necessarily an issue, because a function might be pure (have no side effects).

You can get that here if you're curious about how it works. There is a run-time penalty if the debugging is enabled; of course SFX_CHECK can be disabled so it does nothing (similarly to assert).

The first time an SFX_CHECK protected expression is evaluated, it is parsed in order to determine whether it might have side effects. Because this parsing is done without any access to symbol table information (how identifiers are declared), it is ambiguous. The parser pursues multiple parsing strategies. Backtracking is done using an exception-handling library based on setjmp/longjmp. The parse results are stored in a hash table keyed on the textual expression for faster retrieval on future evaluations.

Upvotes: 4