Reputation: 7867

Is there any performance difference between accessing a hardcode array and a run time initialization array?

For example, I want to create a square root table using array SQRT[i] to optimize a game, but I don't know if there is a performance difference between the following initialization when accessing the value of SQRT[i]:

Hardcode array

int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255}

Generate value at run time

int SQRT[65536];
int main(){
    for(int i=0;i<65536;i++){
        SQRT[i]=sqrt(i);
    }
    //other code
    return 0;
}

Some example of accessing them:

    if(SQRT[a*a+b*b]>something)
    ...

I'm not clear if the program stores or accesses a hard-code array in a different way, also I don't know if the compiler would optimize the hard-code array to speed up the access time, is there performance differences between them when accessing the array?

Upvotes: 5

Answers (3)

Prashanth

Reputation: 11

The access time will be the same. When you hardcode an array, the C library routines called before the main will initialize it (in an embedded system the start up code copies the Read write data, i.e. hardcoded from ROM to RAM address where the array is located, if the array is constant, then it is accessed directly from ROM).

If the for loop is used to initialize, then there is an overhead of calling the Sqrt function.

Upvotes: 0

Peter Cordes

Reputation: 364039

As people said in comments:

if(SQRT[a*a+b*b]>something)

is a horrible example use-case. If that's all you need SQRT for, just square something.

As long as you can tell the compiler that SQRT doesn't alias anything, then a run-time loop will make your executable smaller, and only add a tiny amount of CPU overhead during startup. Definitely use uint8_t, not int. Loading a 32bit temporary from an 8bit memory location is no slower than from a zero-padded 32b memory location. (The extra movsx instruction on x86, instead of using a memory operand, will more than pay for itself in reduced cache pollution. RISC machines typically don't allow memory operands anyway, so you always need an instruction to load the value into a register.)

Also, sqrt is 10-21 cycle latency on Sandybridge. If you don't need it often, the int->double, sqrt, double->int chain is not much worse than an L2 cache hit. And better than going to L3 or main memory. If you need a lot of sqrt, then sure, make a LUT. The throughput will be much better, even if you're bouncing around in the table and causing L1 misses.

You could optimize the initialization by squaring instead of sqrting, with something like

uint8_t sqrt_lookup[65536];
void init_sqrt (void)
{
    int idx = 0;
    for (int i=0 ; i < 256 ; i++) {
        // TODO: check that there isn't an off-by-one here
        int iplus1_sqr = (i+1)*(i+1);
        memset(sqrt_lookup+idx, i, iplus1_sqr-idx);
        idx = iplus1_sqr;
    }
}

You can still get the benefits of having sqrt_lookup being const (compiler knows it can't alias). Either use restrict, or lie to the compiler, so users of the table see a const array, but you actually write to it.

This might involve lying to the compiler, by having it declared extern const in most places, but not in the file that initializes it. You'd have to make sure this actually works, and doesn't create code referring to two different symbols. If you just cast away the const in the function that initializes it, you might get a segfault if the compiler placed it in rodata (or read-only bss memory if it's uninitialized, if that's possible on some platforms?)

Maybe we can avoid lying to the compiler, with:

uint8_t restrict private_sqrt_table[65536];  // not sure about this use of restrict, maybe that will do it?
const uint8_t *const sqrt_lookup = private_sqrt_table;

Actually, that's just a const pointer to const data, not a guarantee that what it's pointing to can't be changed by another reference.

Upvotes: 1

user1084944

Reputation:

First, you should do the hard-coded array right:

static const int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255};

(also using uint8_t instead of int is probably better, so as to reduce the size of the array and making it more cache-friendly)

Doing this has one big advantage over the alternative: the compiler can easily check that the contents of the array cannot be changed.

Without this, the compiler has to be paranoid -- every function call might change the contents of SQRT, and every pointer could potentially be pointing into SQRT and thus any write through an int* or char* might be modifying the array. If the compiler cannot prove this doesn't happen, then that puts limits on the kinds of optimizations it can do, which in some cases could show a performance impact.

Another potential advantage is the ability to resolve things involving constants at compile-time.

If needed, you may be able to help the compiler figure things out with clever use of __restrict__.

In modern C++ you can get the best of both worlds; it should be possible (and in a reasonable way) to write code that would run at compile time to initialize SQRT as a constexpr. That would best be asked a new question, though.

Upvotes: 4

Is there any performance difference between accessing a hardcode array and a run time initialization array?

Answers (3)

Related Questions