Reputation: 8087
It's said that for normal x86 cpu(i7 mac), the cpu cacheline size is 64bytes, so if use an array size<64, it will exist in just one cache line and will be fast, when the cache line size is bigger, program will slow down.
Below is my program:
#include<sys/time.h>
#include<stdlib.h>
#include<stdio.h>
size_t cacheline=16;
int main(int argc,char*argv[]){
size_t loopCount=2000000000;
if(argc==2){loopCount=atol(argv[1]);}
printf("loop=%ld\n",loopCount);
int array[cacheline];
for(size_t a=0;a<cacheline;++a){
array[a]=a;
}
size_t c=0;
long sum=1;
for(size_t i=0;i<loopCount;++i){
if(c==cacheline)c=0;
sum+=array[c++];//
}
printf("sum=%ld\n",sum);
return 0;
}
clang++ test07_cacheline.cpp -O2 -o test07_cacheline && time ./test07_cacheline 2000000000
loop=2000000000
sum=63354543092609
real 0m2.810s
user 0m2.794s
sys 0m0.009s
But in my test program I found, no matter I set the size of "array" to be 16, 64, 256 or 65536, time excution time is basically the same. What's wrong with the theory or my program design? I also tried some other programs from internet, same result, as below:
#include<stdio.h>
#include<stdlib.h>
#include<sys/time.h>
long timediff(clock_t t1,clock_t t2){
return (t2-t1)*1000/CLOCKS_PER_SEC;
}
int main(int argc,char*argv[]){
int array_size=65536;
if(argc>=2)array_size=atoi(argv[1]);
int repeat_times=2000000000;
long array[array_size];
for(int i=0;i<array_size;++i){
array[i]=i;
}
int j=0;
int k=0;
int c=0;
clock_t start=clock();
while(j++<repeat_times){
if(k==array_size){k=0;}
c+=array[k++];
}
clock_t end=clock();
printf("c=%d,%lu\n",c,timediff(start,end));
return 0;
}
g++ test08_cacheline.cpp -O2 -o test08_cacheline && ./test08_cacheline
c=1865233920,2800
No matter how bit array_size. So any explanations of how cacheline affects performance in my program?
Upvotes: 1
Views: 254
Reputation: 2790
What you're missing is that both the executions are exploiting data locality.
You're reading from a contiguous array. Your cache will read contiguous blocks of your array. The only difference is in the number of times you load blocks.
This number is no so big and, moreover, the compiler has the ability of predict when load a new block especially if you flagged some optimization rules such as vectorization.
For more read here
If you wanna see how performance change try modifying your code in this way:
while(j++<repeat_times){
if(k==array_size){k=0;}
int position = ((c+k)*j)%array_size;
c+=array[position];
k++;
}
In such a way you will lose data locality.
Performance depends more on memory access pattern than on cache size. More precisely, if the program is mainly sequential, cache size is not a big deal. If there are quite a lot of random access, cache size really matters.
If you want try to see how cache size leads to different performance you can try:
In this way if the size of the array fits into your cache per each random access the block will already be into the cache.
N.B. You're not the only one using the cache on your PC!
Upvotes: 2