Reputation: 3647
I'm writing some (arm) inline assembly code that works on a huge array of C structs in a loop and stores some data into another array.
the processor supports the PLD prefetching command.
if i'm accessing the data in successive order, is there a gain in performance if I use the prefetch command to load the startadress of the next struct in the array, before i started processing the current one? or should i prefetch in each iteration the next but one? or prefetch a certain ammound of bytes ahead?
does it also make sense to prefetch an address in the destination array?
thanks!
Upvotes: 2
Views: 1280
Reputation: 170489
This heavily depends on the processor inner workings. Maybe prefetching will increase performance, maybe not, you have to review the documentation.
Performance can be increased if there's a separate subunit for loading data in the processor that works in parallel with the computation subunit. Also bear in mind that prefetch instruction is yet another instruction so you better only issue it once for each block length of the cache line, not more often, otherwise you just increase the processor load and waste time. If the subunit for loading data is not separate and you still issue the prefetch instruction you can even face decrease of performance - no increase because of no simultaneous work and greater load of processor leading to wasting time.
You should not prefetch data from the array you only write to - it's just a waste of time.
Upvotes: 1