Reputation: 1529
Trying to understand how pointer chasing is working (basically the access pattern) in the following benchmark:
https://github.com/google/multichase/blob/master/multichase.c
For the simple chase:
static void chase_simple(per_thread_t *t) {
void *p = t->x.cycle[0]; ----> here p is pointing to initial address
do {
x200(p = *(void **)p;) ----> didn't get this statement, is x200 some compiler built-in ?
} while (__sync_add_and_fetch(&t->x.count, 200));
// we never actually reach here, but the compiler doesn't know that
t->x.dummy = (uintptr_t)p;
}
where and how in the code, access pattern is made somewhat unpredictable so that H/W prefetcher doesn't guess it properly?
Upvotes: 3
Views: 466
Reputation: 364160
I'd assume x200
is a CPP macro that repeats its operand 200 times, for loop unrolling. Read x200
as "times 200". Possible implementation:
#define x5(x) x x x x x
...
#define x200(x) x100(x) x100(x)
That would make sense to amortize the cost of the atomic RMW to increment t->x.count
.
You can check by looking at C preprocessor output, e.g. gcc -E
, or search for the macro definition in the included files.
The actual statement being repeated,
p = *(void **)p;
, is just dereferencing a pointer to get another pointer. Often you'd have p = p->next;
if you had struct foo *p
, but with just a void*
you need a cast to keep the compiler happy and get an assembly instruction like mov rax, [rax]
.
Without a struct that can contain a memory that's a pointer to the same struct type, dereferencing a pointer is going to change the type. C doesn't have a way to declare an infinitely-indirect pointer that just produces the same type when you deref it.
where and how in the code, access pattern is made somewhat unpredictable so that H/W prefetcher doesn't guess it properly?
Nowhere in this function; it's just iterating through an existing linked list.
Look for the code that allocates space and stores pointers into it. Probably allocated as one large array, so it has control over the layout, unlike if it did many small mallocs. A random shuffling of the integers 0..n-1 could be mapped transformed to an array of pointers to pointers. Or something like that; perhaps you need to invert that mapping to make sure there aren't short cycles. Or maybe there's some simpler way to generate a linked list that covers all slots in a random order, with the last element pointing to the first so it's a closed loop that can be iterated as many times as you want.
Upvotes: 1