Reputation: 81
Is there exist 'standard' way to force C compiler to not skip 'dummy load' operation that is forcing 'load prefetch' to CPU cache ?
In assembler it is simply load operation like
mov eax,[ebx]
and assembler can not skip this instruction if eax data is not visibly used anyway.
But C optimizing compiler can skip load operations if it see its data is not used in the further calculations.
So exist many ugly hacks for C compiler like perform some unneeded operations with pre-loaded data like summing up and try to 'compare' its result but it is not nice and load CPU with unneeded instructions. Example:
long long accumulator=0;
char *p;
for(int i=0; i < PREFETCH_SIZE; i+=64)
{
accumulator += *(p+i);
}
if (accumulator < some_large_dummy_value)
{
// do something real useful
}
May be exist special 'pragma' or other way to force C compiler to not skip 'software guaranteed prefetch' like:
char *p;
for(int i=0; i < PREFETCH_SIZE; i+=64)
{
char b = *(p+i);
}
I know about _mm_prefetch() but it is 'less guaranteed' in real data prefetch to cache - may be skipped if cause TLB miss, may be limited in 'memory ops buffers overloading' etc,etc.
The intel optimization manual is loaded with 'software prefetch' examples but they are in assembler only form.
It may be closely interconnected with 'locally disabling compiler optimizations' questions like How to prevent gcc optimizing some statements in C? . But most of solutions are compiler-dependent. The promising may be 'volatile' specifier. But it may be working only on memory write opeation ? Or read too ?
EDIT: Finally working solution:
#define CACHE_LINE_SIZE 64
void my_SWprefetch(char *p, int iSize)
{
for(int i=0; i < iSize; i+=CACHE_LINE_SIZE)
{
(void)*(volatile char *)(p+i);
}
}
Upvotes: 0
Views: 136
Reputation: 67713
Cast your pointer to pointer to a volatile object
#define PREFETCH_SIZE 1024
int foo(char *x)
{
char *p = x;
for(int i=0; i < PREFETCH_SIZE; i+=64)
{
(void)*(volatile char *)(p+i);
}
}
int bar(char *x)
{
char *p = x;
for(int i=0; i < PREFETCH_SIZE; i+=64)
{
(void)*(p+i);
}
}
https://godbolt.org/z/W9onvvMdn
foo:
mov rax, rdi
movzx edx, BYTE PTR [rdi]
movzx edx, BYTE PTR [rdi+64]
movzx edx, BYTE PTR [rdi+128]
movzx edx, BYTE PTR [rdi+192]
movzx edx, BYTE PTR [rdi+256]
movzx edx, BYTE PTR [rdi+320]
movzx edx, BYTE PTR [rdi+384]
movzx edx, BYTE PTR [rdi+448]
movzx edx, BYTE PTR [rdi+512]
movzx edx, BYTE PTR [rdi+576]
movzx edx, BYTE PTR [rdi+640]
movzx edx, BYTE PTR [rdi+704]
movzx edx, BYTE PTR [rdi+768]
movzx edx, BYTE PTR [rdi+832]
movzx edx, BYTE PTR [rdi+896]
movzx eax, BYTE PTR [rax+960]
ret
bar:
ret
Upvotes: 1