Anna
Anna

Reputation: 4159

How can I load values from memory without polluting the cache?

I want to read a memory location without polluting the cache. I am working on X86 Linux machine. I tried using MOVNTDQA assembler instruction:

  asm("movntdqa %[source], %[dest] \n\t"
      : [dest] "=x" (my_var) : [source] "m" (my_mem[0]) : "memory");

my_mem is an int* allocated with new, my_var is an int.

I have two problems with this approach:

  1. The code compiles but I am getting "Illegal Instruction" error when running it. Any ideas why?
  2. I am not sure what type of memory is allocated with new. I would assume that WB. According to documentation, the MOVNTDQA instruction will work only will USWC memory type. How can I know what memory type I am working on?

To summarize, my question is:

How can I read a memory location without polluting the cache on an X86 machine? Is my approach in the right direction, and can it be fixed to work?

Thanks.

Upvotes: 15

Views: 2037

Answers (2)

Gunther Piez
Gunther Piez

Reputation: 30439

The problem with the movntdqa instruction with %%xmm as target (loading from memory) is that this insn is only available with SSE4.1 and on. This means newer Core 2 (45 nm) or i7 only so far. The other way around (storing data to memory) is available in earlier SSE versions.

For this instruction, the processor moves the data into one very small of very few read buffers (Intel doesn't specify the exact size, but assume it is in the range of 16 bytes), where it is readily available, but gets kicked out after a few other loads.

And it does not pollute the other caches, so if you have streaming data, your approach is viable.

Remember, you need to use a sfence insn afterwards.

Prefetching exists in two variants: prefetcht0 (Prefetches data in all caches) and prefetchnt (Prefetch non temporal data). Usually prefetch in all caches is the right thing to do, for a streaming data loop the latter would be better, if you make consequent use of the streaming instructions.

You use it with the address of an object you want to use in the near future, usually some iterations ahead if you have a loop. The prefetch insn doesn't wait or block, it just makes the processor start getting the data at the specified memory location.

Upvotes: 8

moonshadow
moonshadow

Reputation: 89085

MOVNTDQA is only available with SSE.

Why are you trying to avoid using the cache? CPUs are generally pretty good at deciding what to kick out of the cache when. If do genuinely need to, one way would be to arrange for an alias of the memory area you are reading from to be mapped into your address space with caching disabled and reading from there.

If what you are trying to achieve is actually to minimise your code's impact on another function's working set being held in cache at the time, this should be doable by issuing appropriate prefetch and invalidate instructions.

Upvotes: 0

Related Questions