How to improve inline function efficiency?

Question

I profiled my code and found that one inline function takes about 8% of the samples. The function is to convert matrix subscripts to indices. It is quite like the matlab function sub2ind.

inline int sub2ind(const int sub_height, const int sub_width, const int width) {
    return sub_height * width + sub_width;
}

I guess the compiler does not perform inline expansion, but I don't know how to check that out.

Is there any way to improve this? Or explicitly let the compiler perform inline expansion?

user1084944 · Accepted Answer

Did you remember to compile with optimizations? Some compilers have an attribute to force inlining, even when the compiler doesn't want to: see this question.

But it probably has already; you can try having your compiler output the assembly code and try to check for sure that way.

It is not implausible that index calculations can be a significant fraction of your time -- e.g. if your algorithm is reading from a matrix, a little bit of calculation, then writing back, then index calculations really are a significant fraction of your compute time.

Or, you've written your code in a way that the compiler can't prove that width remains constant throughout your loops*, and so it has to reread it from memory every time, just to be sure. Try copying width to a local variable and use that in your inner loops.

Now, you've said that this takes 8% of your time -- that means it is unlikely that you can possibly get anything more than an 8% improvement to your runtime, and probably much less. If that's really worth it, then the thing to do is probably to fundamentally change how you iterate through the array.

e.g.

if you tend to access the matrix in a linear fashion, you could write some sort of two-dimensional iterator class that you can advance up, down, left, or right, and it will use additions everywhere instead of multiplication
same thing, but writing an "index" class that just holds the numbers rather than pretending to be a pointer
if width is a compile-time constant, you could make it explicitly so, e.g. as a template parameter, and your compiler might be able to do more clever things with the multiplication

*: You could have done something silly, like put the data structure for your matrix in the very memory where you're storing the matrix entries! So when you update the matrix, you might change the width. The compiler has to guard against these loopholes, so it can't do optimizations it 'obviously should' be able to do. And sometimes, the sort of thing that a loophole in one context can well be the programmer's obvious intent in another context. Generally speaking, these sorts of loop holes tend to be all over the place, and the compiler is better at finding these loopholes than humans are at noticing them.

How to improve inline function efficiency?

Answers (2)

Related Questions