Can I "force" Cachegrind into analyzing an operation (or line)?

Question

I am benchmarking the cache behaviour of two search algorithms that operate on a sorted range of items with Cachegrind. I have n items in a vector, and another vector that holds all valid indices. I use std::random_shuffle on the second vector, and do then perform n successful lookups on the items in the first vector. The function I am benchmarking looks roughly as follows:

template 
void lookup_in_random_order(Iterator begin, Iterator end)
{
    const std::size_t N = std::distance(begin, end);
    std::vector idx(N);
    std::iota(idx.begin(), idx.end(), 0);

    std::srand(std::time(0));
    std::random_shuffle(idx.begin(), idx.end());

    // Warm the cache -- I don't care about measuring this loop.
    for(std::size_t i = 0; i < N; ++i)
        my_search(begin, end, idx[i]);

    std::random_shuffle(idx.begin(), idx.end());

    // This one I would care about!
    for(std::size_t i = 0; i < N; ++i)
    {
        int s = idx[i];
        // Especially this line, of course.
        my_search(begin, end, s);
    }
}

I compile my code with g++ (with -g and -O2). I run Cachegrind and then cg_annotate. I get something like the following:

       Ir I1mr ILmr        Dr    D1mr    DLmr Dw D1mw DLmw
        .    .    .         .       .       .  .    .    .  template 
       17    2    2         0       0       0  6    0    0  void lookup_in_random_order(Iterator begin, Iterator end)
        .    .    .         .       .       .  .    .    .  {
        .    .    .         .       .       .  .    .    .      const std::size_t N = std::distance(begin, end);
        .    .    .         .       .       .  .    .    .      std::vector idx(N);
        .    .    .         .       .       .  .    .    .      std::iota(idx.begin(), idx.end(), 0);
        .    .    .         .       .       .  .    .    .      
        4    0    0         0       0       0  2    1    1      std::srand(std::time(0));
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    0    0         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .              my_search(begin, end, idx[i]);
        .    .    .         .       .       .  .    .    .  
        .    .    .         .       .       .  .    .    .      std::random_shuffle(idx.begin(), idx.end());
        .    .    .         .       .       .  .    .    .  
3,145,729    1    1         0       0       0  0    0    0      for(std::size_t i = 0; i < N; ++i)
        .    .    .         .       .       .  .    .    .      {
1,048,575    0    0 1,048,575 132,865 131,065  0    0    0              int s = idx[i];
        .    .    .         .       .       .  .    .    .              my_search(begin, end, s);
        .    .    .         .       .       .  .    .    .      }
        7    0    0         6       1       1  0    0    0  }

For some reason, some lines (especially the most interesting one!) consist of dots. Now, the Cachegrind manual says "Events not applicable for a line are represented by a dot. This is useful for distinguishing between an event which cannot happen, and one which can but did not."

How should this be interpreted? My first idea was that maybe the compiler optimizes my searches away. I thought this cannot be, because the program did spend quite a bit of time running. Still, I tried compiling without the -O2 flag and it seemed to work in a sense that now every line with a call to my_search recorded some numbers (no dots anymore!). However, this doesn't seem like the right way to go for obvious reasons.

In general, is there a way I can tell Cachegrind that "look at this line especially, I am very interested how many cache misses it causes"?

Joost Sannen · Accepted Answer

My guess is that with O2 it allows the compiler to perform automatic inlining of the functions where you see the dots. Cachegrind will not see the inlined function calls as the calls have dissappeared. Try "-fno-inline" (Compiler options)

Of course you will probably have different cache performance numbers with and without inlining.

Can I "force" Cachegrind into analyzing an operation (or line)?

Answers (1)

Related Questions

Can I &quot;force&quot; Cachegrind into analyzing an operation (or line)?

Answers (1)

Related Questions

Can I "force" Cachegrind into analyzing an operation (or line)?