Reputation: 1370
I need simple ZeroMemory implementation with SSE (SSE2 prefered) Can someone help with that. I was serching thru SO and net but not found direct answer to that.
Upvotes: 2
Views: 2251
Reputation: 6781
Almost all of the transistors in your CPU are used to somehow make memory access as fast as possible. The CPU is already doing an amazing job at all memory accesses, and the instructions run at a drastically faster rate than possible memory accesses.
Therefore, trying to beat memset is a mostly futile exercise in most cases because it is already limited by the speed of your memory (as mentioned by others).
Upvotes: 0
Reputation: 10937
I you want to speed up your code than you must exactly understand how your CPU works and where is the bottleneck.
Here you are my speed optimized routine just to show how should be made.
On my PC is about 5 time faster (clear 1MBytes mem block) than your, test it and ask if somethink isn't clear:
//edx = memory pointer must be 16 bytes aligned
//ecx = memory count must be multiple of 16
xorps xmm0, xmm0 //Clear xmm0
mov eax, ecx //Save ecx to eax
and ecx, 0FFFFFF80h //Clear only 128 byte pages
jz @ClearRest //Less than 128 bytes to clear
@Aligned128BMove:
movdqa [edx], xmm0 //Clear first 16 bytes of 128 bytes
movdqa [edx + 10h], xmm0 //Clear second 16 bytes of 128 bytes
movdqa [edx + 20h], xmm0 //...
movdqa [edx + 30h], xmm0
movdqa [edx + 40h], xmm0
movdqa [edx + 50h], xmm0
movdqa [edx + 60h], xmm0
movdqa [edx + 70h], xmm0
add edx, 128 //inc mem pointer
sub ecx, 128 //dec counter
jnz @Aligned128BMove
@ClearRest:
and eax, 07Fh //Clear the rest
jz @Exit
@LoopRest:
movdqa [edx], xmm0
add edx, 16
sub eax, 16
jnz @LoopRest
@Exit:
Upvotes: 1
Reputation: 33592
Is ZeroMemory()
or memset()
not good enough?
Disclaimer: Some of the following may be SSE3.
push
to save an xmm regpxor
to zero the xmm regmovdqa
or movntdq
to do the writepop
to restore the xmm reg.movntdq
may appear to be faster because it tells the processor to not bring the data into your cache, but this can cause a performance penalty later if the data is going to be used. It may be more appropriate if you are scrubbing memory before freeing it (like you might do with SecureZeroMemory()
).
Upvotes: 5