linux c++: test of the cacheline size performance effect not as expected

Question

It's said that for normal x86 cpu(i7 mac), the cpu cacheline size is 64bytes, so if use an array size<64, it will exist in just one cache line and will be fast, when the cache line size is bigger, program will slow down.

Below is my program:

#include
#include
#include
size_t cacheline=16;
int main(int argc,char*argv[]){
    size_t loopCount=2000000000;
    if(argc==2){loopCount=atol(argv[1]);}
    printf("loop=%ld
",loopCount);
    int array[cacheline];
    for(size_t a=0;a



But in my test program I found, no matter I set the size of "array" to be 16, 64, 256 or 65536, time excution time is basically the same. What's wrong with the theory or my program design? I also tried some other programs from internet, same result, as below:

#include
#include
#include
long timediff(clock_t t1,clock_t t2){
    return (t2-t1)*1000/CLOCKS_PER_SEC;
}
int main(int argc,char*argv[]){
    int array_size=65536;
    if(argc>=2)array_size=atoi(argv[1]);
    int repeat_times=2000000000;
    long array[array_size];
    for(int i=0;i


No matter how bit array_size. So any explanations of how cacheline affects performance in my program?

granmirupa · Accepted Answer

What you're missing is that both the executions are exploiting data locality.

You're reading from a contiguous array. Your cache will read contiguous blocks of your array. The only difference is in the number of times you load blocks.

This number is no so big and, moreover, the compiler has the ability of predict when load a new block especially if you flagged some optimization rules such as vectorization.

For more read here

If you wanna see how performance change try modifying your code in this way:

   while(j++


In such a way you will lose data locality.

UPDATE
Performance depends more on memory access pattern than on cache size. More precisely, if the program is mainly sequential, cache size is not a big deal. If there are quite a lot of random access, cache size really matters.
If you want try to see how cache size leads to different performance you can try:

Remove data locality doing random accesses
Then changing the size of the array.

In this way if the size of the array fits into your cache per each random access the block will already be into the cache.
N.B. You're not the only one using the cache on your PC!

linux c++: test of the cacheline size performance effect not as expected

Answers (1)

UPDATE

Related Questions