ideasman42
ideasman42

Reputation: 48228

Rounding up to powers of 2 (with preprocessor constants)

Using unsigned ints it's possible to round to a power of 2 like this:

unsigned int power_of_2_max_u(unsigned int x)
{
    x -= 1;
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >>16);
    return x + 1;
}

or…

int is_power_of_2_i(int n)
{
    return (n & (n - 1)) == 0;
}

int power_of_2_max_i(int n)
{
    if (is_power_of_2_i(n))
        return n;

    do {
        n = n & (n - 1);
    } while (!is_power_of_2_i(n));

    return n * 2;
}

However when the value is a constant, it should be possible to use the preprocessor to avoid having to calculate the value every time. e. g.:

i = power_of_2_max_u(sizeof(SomeStruct));

This is a macro which is equivalent to power_of_2_max_u.

#define POW2_CEIL(v) (1 + \
(((((((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) | \
     ((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) >> 0x04))) | \
   ((((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) | \
     ((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) >> 0x04))) >> 0x02))) | \
 ((((((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) | \
     ((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) >> 0x04))) | \
   ((((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) | \
     ((((v) - 1) | (((v) - 1) >> 0x10) | \
      (((v) - 1) | (((v) - 1) >> 0x10) >> 0x08)) >> 0x04))) >> 0x02))) >> 0x01))))

Generated by this Python-3 script.

import math
BITS = 32
data = (["(v - 1)"] +
        ["(v | (v >> 0x%02x))" % i
         for i in reversed([2 ** j
         for j in range(int(math.log2(BITS)))])])
macro = "1 + v"
for l in reversed(data):
    macro = macro.replace('v', '(%s)' % l)
print("#define POW2_CEIL(v)", macro) 

Using statement expressions it can be made a little nicer, but this relies on a GCC extension.

#define POW2_CEIL(x_) ({      \
    unsigned int x = x_; \
    x -= 1;              \
    x = x | (x >> 1);    \
    x = x | (x >> 2);    \
    x = x | (x >> 4);    \
    x = x | (x >> 8);    \
    x = x | (x >>16);    \
    x + 1; })

While this is a fairly ugly macro, I double checked and the compiler does reduce this correctly to a constant (as it should) so unsigned int a = 8; is exactly equivalent to unsigned int a = POW2_CEIL(7);, however I was interested to know if there was a better way to do this (possibly not limited to an int range).

Upvotes: 5

Views: 3009

Answers (2)

However when the value is a constant, it should be possible to use the preprocessor to avoid having to calculate the value every time.

Readability >>> performance. First of all, ask yourself the question" "is this a real bottleneck in my code?"

If the answer is "no", then don't do anything; move on without thinking about micro-"optimizing" your code prematurely.

Even if this is an important part of your code, consider that your functions are pure. Their return value only depends on their arguments, since they perform simple computations (a mapping, basically). Chances are, any decent optimizing compiler will constant-fold your functions when called with expressions known at compile-time.

For example, the following code…

#include <stdio.h>

static unsigned power_of_2_max_u(unsigned x)
{
    x -= 1;
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >>16);
    return x + 1;
}

struct Foo {
    int x;
    char c;
    int i;
};

int main()
{
    printf("%zu %u\n", sizeof(struct Foo), power_of_2_max_u(sizeof(struct Foo)));

    return 0;
}

…is translated to the following assembly by clang -O2 -flto:

/tmp/constfold:
(__TEXT,__text) section
_main:
0000000100000f50    pushq   %rbp
0000000100000f51    movq    %rsp, %rbp
0000000100000f54    leaq    0x37(%rip), %rdi  ## literal pool for: "%zu %u\n"
0000000100000f5b    movl    $0xc, %esi        ## *** sizeof(struct Foo) == 12
0000000100000f60    movl    $0x10, %edx       ## *** rounded up == 16
0000000100000f65    xorl    %eax, %eax
0000000100000f67    callq   0x100000f70       ## symbol stub for: _printf
0000000100000f6c    xorl    %eax, %eax
0000000100000f6e    popq    %rbp
0000000100000f6f    retq

So, you didn't need to do anything for the compiler to constant-fold your function.

Only consider re-writing your functions as one of the macros above (or using non-portable compiler intrinsics) if you can observe that they are not being constant-folded.

Upvotes: 3

oakad
oakad

Reputation: 7625

How about using a nice compiler built-in (works in gcc and clang):

int msb = 31 - __builtin_clz(v);
int log2_value = v > (1U << msb) ? msb + 1 : msb;

or, for 64b case:

int msb = 63 - __builtin_clzl(v);
int log2_value = v > (1ULL << msb) ? msb + 1 : msb;

(v is an input variable).

Wrapping the above into a macro or c++ constexpr is left as exercise to a reader.

Using MSVC, a built-in, similar to __builtin_clz is called _BitScanReverse (http://msdn.microsoft.com/en-us/library/fbxyd7zd%28v=vs.85%29.aspx).

Upvotes: 0

Related Questions