Disabled Hardware prefetcher's effect not reflected in access time ,not showing any difference in access time

Question

I have disabled h/w prefetcher in my system ( both core2duo and core i7 system). I follow the link to disable it . How do I programmatically disable hardware prefetching?

Also I have disabled gcc optimization with -O0 option while compiling the program. After disabling H/W prefetching I am accessing consecutive sets from cache(by accessing array index which maps to consecutive sets in cache) , but still I am getting same result as before , when H/W prefetching was enabled.

As per my understanding, after seeing stride pattern, H/W prefetcher enabled and it prefetch two consecutive cache lines ( 128 Bytes) from higher cache/main memory and loaded into lower cache.So when a cache line is accessed, there is a miss for the cache line and it is loaded from higher cache, also the next cache line pre-loaded due to H/W prefetcher . So We get higher access time for first cache line as it is loaded from higher level of cache ,but access time for the next cache line is less as it is already in L1 cache due to H/W prefetcher already loaded it.

Now, if H/W prefetcher is disabled, so although there is a stride pattern is detected, the H/W prefetcher will not load next cache lines from higher cache in advance during the access of adjacent previous cache lines, and for the next cache line there will be a miss and it will be loaded from next level of cache and so higher access time for this cache lines is expected.

But, in reality , even after disabling H/W prefetcher I am not getting higher access time for consecutive cache lines, means H/W prefetcher is not disable at all in my machine .

Am I correct?

Also there is L2 streaming prefetcher ( Adjacent cache line )prefetcher , which by default is disabled.(BIT 19 in MSR)

How To check H/W prefetcher is disabled or not ? Is there any way to check whether H/W prefetcher is disabled ot not ?

Here is my code

#include 
#include
#include
#include
#include 
#include 
#include 
int main()
{
int cacheArray[10000],temp;
int i, block = 12;
unsigned long t1,t2,total;
struct timespec tim1,tim2;

for(i=0;i<5;i++)
{
clock_gettime(CLOCK_REALTIME, &tim1);
temp = cacheArray[block*16];
clock_gettime(CLOCK_REALTIME, &tim2);

t1=tim1.tv_sec*1000000000+(tim1.tv_nsec);
t2=tim2.tv_sec*1000000000+(tim2.tv_nsec);
total = t2 - t1;
printf("Accessing %d th block took %lu nanosec 
", block, total);
block =block + 1;
clock_gettime(CLOCK_REALTIME, &tim1);
temp = cacheArray[block*16];
clock_gettime(CLOCK_REALTIME, &tim2);
t1=tim1.tv_sec*1000000000+(tim1.tv_nsec);
t2=tim2.tv_sec*1000000000+(tim2.tv_nsec);
total = t2 - t1;
printf("Accessing %d th block took %lu nanosec 
", block, total);
block = block + 20;
}
}

Here is my sample output :

Accessing 12 th block took 137 nanosec 
Accessing 13 th block took 54 nanosec 
Accessing 33 th block took 39 nanosec 
Accessing 34 th block took 37 nanosec 
Accessing 54 th block took 687 nanosec 
Accessing 55 th block took 93 nanosec 
Accessing 75 th block took 108 nanosec 
Accessing 76 th block took 107 nanosec 
Accessing 96 th block took 109 nanosec 
Accessing 97 th block took 106 nanosec

I am expecting same/higher access time for consecutive cache lines/blocks. Why the next cache block/line is loaded into cache although H/W prefetcher is disabled , so theoretically next cache lines must not be loaded into cache in advance when they are not accessed.

Any suggestion or links will be highly appreciated. Thanks in advance .

Disabled Hardware prefetcher's effect not reflected in access time ,not showing any difference in access time

Answers (1)

Related Questions

Disabled Hardware prefetcher&#39;s effect not reflected in access time ,not showing any difference in access time

Answers (1)

Related Questions

Disabled Hardware prefetcher's effect not reflected in access time ,not showing any difference in access time