Reputation: 180
I am developing a cache efficient transpose algorithm using tiling and I noticed that when I allocate the memory with malloc
I get worse performance than when using posix_memalign
. More specifically :
Using malloc : 98.7 mSec
Using posix : 86.4 mSec (for data alignments of 32,64,128,256,512,1024,2048,4096)
I am allocating an array of 32-bit integers.
I can't explain why posix-x, where 4096>x>32 and x is the data alignment, provides always more or less the same efficiency for this range of data alignment values. In my algorithm I am pre-fetching cache lines (64 bytes) so I would expect that for x=64 I would have the best performance numbers.
Upvotes: 3
Views: 1397
Reputation: 37
I did a simple test,when aligned with 8 bytes,the performance is best.
malloc
use 8 bytes align by default. I tried posix_memalign
to make
align larger, but that doesn't make performance better.And performace
has only a little difference with aligned by 8 bytes.
Upvotes: 1