Reputation: 51545
in certain applications, I have need to collapse nested loops into one while retaining individual index information.
for j in N:
for i in M:
... A(i,j) ...
// Collapse the loops
for ij in MN:
... A(i,j) ...
so have looked at the obvious ways to recover i,j from ij using division/modulo (expensive operation) and using if statements (breaks vectorization, branch prediction problems).in the end i came up with the following (using C-style comparisons):
j += (i == m)
i *= (i != m)
++i, ++ij
is there perhaps a even better way to do that? thanks
Upvotes: 2
Views: 2191
Reputation: 2608
I am not sure why do you want to collapse loops. Make sure the most inner loop has a high trip count (by loop inversion) and make sure your data is sequential in memory. I've seen algorithms run 10 times faster when these two conditions are met.
Upvotes: 0
Reputation: 75673
In general, it offers no performance advantage to collapse the loop as described.
Compilers do sometimes collapse such loops, but typically in unexpected ways.
In particular languages, or on particular platforms, you can speed up loops in general by:
But in all cases you have to have profiled your code to see that such efforts are warranted.
Generally, in my experience, nested loops like this are dominated by:
But that might not be applicable advice on your problem domain and platform. Profile!
Upvotes: 8
Reputation: 10467
Going other way might be cheaper.
for j in N:
for i in M:
ij=j*i+j
Upvotes: 0