DTL2020
DTL2020

Reputation: 81

.Force optimizing C compiler to not skip 'load-prefetch' operation

Is there exist 'standard' way to force C compiler to not skip 'dummy load' operation that is forcing 'load prefetch' to CPU cache ?

In assembler it is simply load operation like mov eax,[ebx] and assembler can not skip this instruction if eax data is not visibly used anyway.

But C optimizing compiler can skip load operations if it see its data is not used in the further calculations.

So exist many ugly hacks for C compiler like perform some unneeded operations with pre-loaded data like summing up and try to 'compare' its result but it is not nice and load CPU with unneeded instructions. Example:

    long long accumulator=0;
    char *p;
        for(int i=0; i < PREFETCH_SIZE; i+=64)
        {
           accumulator += *(p+i);
        }
    if (accumulator < some_large_dummy_value)
    {
       // do something real useful
    }

May be exist special 'pragma' or other way to force C compiler to not skip 'software guaranteed prefetch' like:

char *p;
for(int i=0; i < PREFETCH_SIZE; i+=64)
{
   char b = *(p+i);
}

I know about _mm_prefetch() but it is 'less guaranteed' in real data prefetch to cache - may be skipped if cause TLB miss, may be limited in 'memory ops buffers overloading' etc,etc.

The intel optimization manual is loaded with 'software prefetch' examples but they are in assembler only form.

It may be closely interconnected with 'locally disabling compiler optimizations' questions like How to prevent gcc optimizing some statements in C? . But most of solutions are compiler-dependent. The promising may be 'volatile' specifier. But it may be working only on memory write opeation ? Or read too ?

EDIT: Finally working solution:

#define CACHE_LINE_SIZE 64
void my_SWprefetch(char *p, int iSize)
{
    
    for(int i=0; i < iSize; i+=CACHE_LINE_SIZE)
    {
        (void)*(volatile char *)(p+i);

    }
}

Upvotes: 0

Views: 136

Answers (1)

0___________
0___________

Reputation: 67713

Cast your pointer to pointer to a volatile object

#define PREFETCH_SIZE 1024

int foo(char *x)
{
    char *p = x;
    for(int i=0; i < PREFETCH_SIZE; i+=64)
    {
        (void)*(volatile char *)(p+i);

    }
}

int bar(char *x)
{
    char *p = x;
    for(int i=0; i < PREFETCH_SIZE; i+=64)
    {
        (void)*(p+i);

    }
}

https://godbolt.org/z/W9onvvMdn

foo:
        mov     rax, rdi
        movzx   edx, BYTE PTR [rdi]
        movzx   edx, BYTE PTR [rdi+64]
        movzx   edx, BYTE PTR [rdi+128]
        movzx   edx, BYTE PTR [rdi+192]
        movzx   edx, BYTE PTR [rdi+256]
        movzx   edx, BYTE PTR [rdi+320]
        movzx   edx, BYTE PTR [rdi+384]
        movzx   edx, BYTE PTR [rdi+448]
        movzx   edx, BYTE PTR [rdi+512]
        movzx   edx, BYTE PTR [rdi+576]
        movzx   edx, BYTE PTR [rdi+640]
        movzx   edx, BYTE PTR [rdi+704]
        movzx   edx, BYTE PTR [rdi+768]
        movzx   edx, BYTE PTR [rdi+832]
        movzx   edx, BYTE PTR [rdi+896]
        movzx   eax, BYTE PTR [rax+960]
        ret
bar:
        ret

Upvotes: 1

Related Questions