Kabira  K
Kabira K

Reputation: 2007

why is multithread access to data in same cacheline has low cache miss rate?

Its been noted that access to data elements that fall in same cache-line performs badly due to ping-pong effect. However, the code I wrote doesn't and tested with valgrind --tool=cachegrind doesn't show this behaviour. Would appreciate any insights regarding this?.

Attached below is function that each pthread executes:

   void test_cache(void* arg)    
   {    
    long id = (long) arg;  
    uint32_t idx = (uint32_t) id;  
    uint32_t ctr = 0;  
    uint32_t total_sum = 0;  
    for(; ctr < 500000; ++ctr)  
    {  
      total_sum += shared[idx];  
      AO_fetch_and_add(&shared[idx], idx);    
    }
    printf("%d %d,\n",id, total_sum);   
}  

Upvotes: 1

Views: 811

Answers (2)

Nikolai Fetissov
Nikolai Fetissov

Reputation: 84151

If you are running on a "dual core" whatever, you are hitting shared cache. You need separate physical CPUs to see the ping-pong effect. Include your hardware spec in the question.

Upvotes: 0

Yann Ramin
Yann Ramin

Reputation: 33167

Reads are ok (once the cache is filled), writes are not, as that, depending on architecture, will cause all other processors to invalidate that cache line and fetch the line from memory. (Systems that do cache line snooping could avoid that penalty).

The initial cache line load would also have a penalty as a load per cache is required (shared caches are better), with the situation being the worst in NUMA (fetch from distant processor).

Upvotes: 2

Related Questions