why is multithread access to data in same cacheline has low cache miss rate?

Question

Its been noted that access to data elements that fall in same cache-line performs badly due to ping-pong effect. However, the code I wrote doesn't and tested with valgrind --tool=cachegrind doesn't show this behaviour. Would appreciate any insights regarding this?.

Attached below is function that each pthread executes:

   void test_cache(void* arg)    
   {    
    long id = (long) arg;  
    uint32_t idx = (uint32_t) id;  
    uint32_t ctr = 0;  
    uint32_t total_sum = 0;  
    for(; ctr < 500000; ++ctr)  
    {  
      total_sum += shared[idx];  
      AO_fetch_and_add(&shared[idx], idx);    
    }
    printf("%d %d,
",id, total_sum);   
}

Nikolai Fetissov · Accepted Answer

If you are running on a "dual core" whatever, you are hitting shared cache. You need separate physical CPUs to see the ping-pong effect. Include your hardware spec in the question.

why is multithread access to data in same cacheline has low cache miss rate?

Answers (2)

Related Questions