Reputation: 18821
I am doing some experiments with Cachegrind, Callgrind and Gem5. I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5.
Let's take a very simple example:
int main() {
int i, l;
for (i = 0; i < 1000; i++) {
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
l++;
... (100 times)
}
}
I compile with:
gcc ex.c --static -o ex
So basically, according to the asm file, addl $1, -8(%rbp)
is executed 100,000 times. Since it's both a read and a write, I was expecting 100k read and 100k write. However, cachegrind only counts them as read and callgrind only as write.
% valgrind --tool=cachegrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15356== Cachegrind, a cache and branch-prediction profiler
==15356== Copyright (C) 2002-2012, and GNU GPL'd, by Nicholas Nethercote et al.
==15356== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15356== Command: ./ex
==15356==
--15356-- warning: L3 cache found, using its data for the LL simulation.
==15356==
==15356== I refs: 111,535
==15356== I1 misses: 475
==15356== LLi misses: 280
==15356== I1 miss rate: 0.42%
==15356== LLi miss rate: 0.25%
==15356==
==15356== D refs: 104,894 (103,791 rd + 1,103 wr)
==15356== D1 misses: 557 ( 414 rd + 143 wr)
==15356== LLd misses: 172 ( 89 rd + 83 wr)
==15356== D1 miss rate: 0.5% ( 0.3% + 12.9% )
==15356== LLd miss rate: 0.1% ( 0.0% + 7.5% )
==15356==
==15356== LL refs: 1,032 ( 889 rd + 143 wr)
==15356== LL misses: 452 ( 369 rd + 83 wr)
==15356== LL miss rate: 0.2% ( 0.1% + 7.5% )
-
% valgrind --tool=callgrind --I1=512,8,64 --D1=512,8,64
--L2=16384,8,64 ./ex
==15376== Callgrind, a call-graph generating cache profiler
==15376== Copyright (C) 2002-2012, and GNU GPL'd, by Josef Weidendorfer et al.
==15376== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15376== Command: ./ex
==15376==
--15376-- warning: L3 cache found, using its data for the LL simulation.
==15376== For interactive control, run 'callgrind_control -h'.
==15376==
==15376== Events : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
==15376== Collected : 111532 2777 102117 474 406 151 279 87 85
==15376==
==15376== I refs: 111,532
==15376== I1 misses: 474
==15376== LLi misses: 279
==15376== I1 miss rate: 0.42%
==15376== LLi miss rate: 0.25%
==15376==
==15376== D refs: 104,894 (2,777 rd + 102,117 wr)
==15376== D1 misses: 557 ( 406 rd + 151 wr)
==15376== LLd misses: 172 ( 87 rd + 85 wr)
==15376== D1 miss rate: 0.5% ( 14.6% + 0.1% )
==15376== LLd miss rate: 0.1% ( 3.1% + 0.0% )
==15376==
==15376== LL refs: 1,031 ( 880 rd + 151 wr)
==15376== LL misses: 451 ( 366 rd + 85 wr)
==15376== LL miss rate: 0.2% ( 0.3% + 0.0% )
Could someone give me a reasonable explanation? Would I be correct to consider there are in fact ~100k reads and ~100k writes (i.e. 2 cache accesses for an addl)?
Upvotes: 14
Views: 1443
Reputation: 1458
From cachegrind manual: 5.7.1. Cache Simulation Specifics
Instructions that modify a memory location (e.g. inc and dec) are counted as doing just a read, i.e. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.
Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.
It would seem that callgrind's cache simulation logic is different from cachegrind. I would think that callgrind should produce the same results as cachegrind, so maybe this is a bug?
Upvotes: 3
Reputation: 270
callgrind does not full cache simulation by default. see here: http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.cachesimulation
To enable data read access you need to add --cache-sim=yes for callgrind. Having said this, why even using callgrind on this code? There is not a single function call (which is what callgrind is for)
Upvotes: -1