Cache Optimizations for adding 2 long vector

Question

Given 2 long vectors 2000 element each are to be added on machine with 32 byte cache line (single level cache) and a CPU. We have to add these 2 vectors such that sum goes in a new vector.
e.g. c[0]=a[0]+b[0], c[1]=a[1]+b[1], c[2]=a[2]+b[2]......... c[1999]=a[1999]+b[1999]

I know when c[0]=a[0]+b[0] is done we will have a[0]to a[31], b[0]to b[31], c[0]to c[31] in cache . So we will get a cache miss at every 32nd element. Somebody asked me this:

Can you optimize it more for getting better performance (over what I stated above. Cache miss only at 32 element because of locality)?

I am sure there is something more to this that I don't know.

Cache Optimizations for adding 2 long vector

Answers (1)

Related Questions