Damon
Damon

Reputation: 70186

std::function lambda optimization

std::function is known to have performance issues because it may do heap allocations. Admitted, if you are being 100% honest, one heap allocation should hardly be a problem in most cases... but let's just assume doing a heap allocation is undesirable or forbidding in a particular scenario. Maybe we're doing a few million callbacks and don't want a few million heap allocations for that, whatever.

So... we want to avoid that heap allocation.

The Dr. Dobbs article Efficient Use of Lambda Expressions and std::function gives a recommendation on optimizing the use of std::function by taking advantage of the small object optimization that is recommended by the standard and implemented in every mainstream standard library.

The article goes into length explaining how the standard library must copy the functor since the std::function object might outlive the original functor (though you can use std::ref if you are sure it doesn't), which would be bad mojo. Also, captures need to be copied, and here is the problem: The exact type of closure (or its size) is not known beforehand as it could be any type of closure with any number of captures, so some compromise must be made. Up to a certain size, the captures will be saved in a store inside the function object, and beyond that, it will be dynamically allocated. The store is small, anywhere from 12 to 16 bytes, so assuming a 64-bit build, a maximum of two pointers (not counting the actual function pointer).

Dr. Dobbs thus recommends (and several other sites pick up that advice, seemingly without much of an objection) capturing a reference to a struct that holds references to what you actually want to capture. That way, you only capture one reference, which is just perfect, since it will always fit into the small object store.

How does that work? The assumption which made copying stuff around necessary in the first place was that the function object may outlive the scope of the original closure. Which means, of course, that it also outlives the structure that it holds a reference to, as well as anything referenced from inside that struct.

How is this supposed to work? And since I can't see how it could possibly work, is there a better well-known recipe to address this? (one that doesn't reference invalid objects)

Upvotes: 8

Views: 2245

Answers (2)

Cassio Neri
Cassio Neri

Reputation: 20533

I'm sorry that the article wasn't clearly enough. (I'm the author.)

The advised technique is indeed not supposed to work when the std::function object outlives the scope of the original closure in which case you must not use std::ref and must pay the price of copying and, potentially, making a heap allocation.

The point of the article is this: when there's no lifetime issue (a case which, as nimrodm pointed out is quite common), a user can pass this information to std::function for its constructor to somehow take the closure object by reference instead of by value. Obviously, the user cannot magically and punctually change the signature of std::function's constructor for one particular call. That's where std::reference_wrapper and std::ref come in. The client passes a std::reference_wrapper (created by std::ref) object to std::function's constructor. Then, what gets copied is this object which is small and should fit in the small-object-optimisation buffer and acts as a "reference" to the original closure object.

In here you can see impact on the performance of std::function construction (sure this is just one point of consideration amongst many others). In this example the closure object contains 3 doubles and using std::ref makes the construction 7.5 times faster (YMMV):

enter image description here

One can inspect the generated assembly by clicking on the "Assembly" tab of the link above. A notable difference between the two versions is that the slower one contains this line:

callq  404430 <operator new(unsigned long)@plt>

This confirms that there's a call to operator new which is absent from the faster alternative.

Upvotes: 2

nimrodm
nimrodm

Reputation: 23829

I don't think it's supposed to work if the function object does outlive its calling function (and you're capturing references to objects that are on the stack).

In many practical cases the function object is used locally and will not outlive its caller and then you can avoid the heap allocation (but then again, the compiler might be able to optimize the references and the entire struct technique is probably not necessary).

Here's a simple test which compiles but crashes (tested on clang in C++14 mode.)

Upvotes: 4

Related Questions