Reputation: 13417
Regarding non-temporal writes and write-combining techniques, I have the following code
void setbytes(char *p, int c)
{
__m128i i = _mm_set_epi8(c, c, c, c,
c, c, c, c,
c, c, c, c,
c, c, c, c);
_mm_stream_si128((__m128i *)&p[0], i);
_mm_stream_si128((__m128i *)&p[16], i);
_mm_stream_si128((__m128i *)&p[32], i);
_mm_stream_si128((__m128i *)&p[48], i);
}
taken from here
It is written that
To summarize, this code sequence not only avoids reading the cache line before it is written, it also avoids polluting the cache with data which might not be needed soon. This can have huge benefits in certain situations.
My question is: which cache line is avoided to be written? The cache line that stores the content of the i variable or the cache line where the p pointer points (which gets modified afterwards)?
Upvotes: 3
Views: 1384
Reputation: 56
about: "avoids reading the cache line before it is written"
This statement refers to the 'write allocate' policy for handling writes that miss the cache. All modern x86 processors do this. It goes like this: Software writes to memory using a normal mov instruction. If that address is already cached, then cache is updated and there is no DRAM access at all. However, if the data is not in cache, the processor reads that cache line from DRAM. Then the data from the mov instruction is merged into the data in cache. The processor will postpone writing that data back out to DRAM for as long as possible. The end result is counter-intuitive: software executes a write (mov) instruction, and a single DRAM read (burst) results. If this pattern repeats, the cache will eventually become full and evictions will be needed to make room for the reads. In that case, there will be a DRAM write burst of an unrelated cache line address followed be read of the address the software is writing. This explains why non-temporal stores give roughly 2X the performance for filling a large buffer. Only half as many DRAM accesses occur when compared to using mov to fill the buffer.
Upvotes: 4
Reputation: 26171
Streaming prevents polluting of the cache if the address of the destination isn't already in the cache, else it just updates the cache as needed with the new values written the address backed by that cacheline.
so in your example, if you have not read from p
(or you have flushed it from the cache with CLFLUSH
), a streaming store will prevent the data being written to where p
points being loaded into the cache for the address pointed to by p
(ie: no cachline's will be created for the addresses written to).
Upvotes: 1