Reputation: 25311
Consider the following code:
class A
{
B* b; // an A object owns a B object
A() : b(NULL) { } // we don't know what b will be when constructing A
void calledVeryOften(…)
{
if (b)
delete b;
b = new B(param1, param2, param3, param4);
}
};
My goal: I need to maximize performance, which, in this case, means minimizing the amount of memory allocations.
The obvious thing to do here is to change B* b;
to B b;
. I see two problems with this approach:
b
in the constructor. Since I don't know what b
will be, this means I need to pass dummy values to B's constructor. Which, IMO, is ugly.calledVeryOften()
, I'll have to do something like this: b = B(…)
, which is wrong for two reasons:
b
won't be called.b
, then the destructor of the temporary instance will be called. The copy and the destructor call could be avoided. Worse, calling the destructor could very well result in undesired behavior.So what solutions do I have to avoid using new
? Please keep in mind that:
Upvotes: 13
Views: 878
Reputation: 503775
I liked Klaim's answer, so I wrote this up real fast. I don't claim perfect correctness but it looks pretty good to me. (i.e., the only testing it has is the sample main
below)
It's a generic lazy-initializer. The space for the object is allocated once, and the object starts at null. You can then create
, over-writing previous objects, with no new memory allocations.
It implements all the necessary constructors, destructor, copy/assignment, swap, yadda-yadda. Here you go:
#include <cassert>
#include <new>
template <typename T>
class lazy_object
{
public:
// types
typedef T value_type;
typedef const T const_value_type;
typedef value_type& reference;
typedef const_value_type& const_reference;
typedef value_type* pointer;
typedef const_value_type* const_pointer;
// creation
lazy_object(void) :
mObject(0),
mBuffer(::operator new(sizeof(T)))
{
}
lazy_object(const lazy_object& pRhs) :
mObject(0),
mBuffer(::operator new(sizeof(T)))
{
if (pRhs.exists())
{
mObject = new (buffer()) T(pRhs.get());
}
}
lazy_object& operator=(lazy_object pRhs)
{
pRhs.swap(*this);
return *this;
}
~lazy_object(void)
{
destroy();
::operator delete(mBuffer);
}
// need to make multiple versions of this.
// variadic templates/Boost.PreProccesor
// would help immensely. For now, I give
// two, but it's easy to make more.
void create(void)
{
destroy();
mObject = new (buffer()) T();
}
template <typename A1>
void create(const A1 pA1)
{
destroy();
mObject = new (buffer()) T(pA1);
}
void destroy(void)
{
if (exists())
{
mObject->~T();
mObject = 0;
}
}
void swap(lazy_object& pRhs)
{
std::swap(mObject, pRhs.mObject);
std::swap(mBuffer, pRhs.mBuffer);
}
// access
reference get(void)
{
return *get_ptr();
}
const_reference get(void) const
{
return *get_ptr();
}
pointer get_ptr(void)
{
assert(exists());
return mObject;
}
const_pointer get_ptr(void) const
{
assert(exists());
return mObject;
}
void* buffer(void)
{
return mBuffer;
}
// query
const bool exists(void) const
{
return mObject != 0;
}
private:
// members
pointer mObject;
void* mBuffer;
};
// explicit swaps for generality
template <typename T>
void swap(lazy_object<T>& pLhs, lazy_object<T>& pRhs)
{
pLhs.swap(pRhs);
}
// if the above code is in a namespace, don't put this in it!
// specializations in global namespace std are allowed.
namespace std
{
template <typename T>
void swap(lazy_object<T>& pLhs, lazy_object<T>& pRhs)
{
pLhs.swap(pRhs);
}
}
// test use
#include <iostream>
int main(void)
{
// basic usage
lazy_object<int> i;
i.create();
i.get() = 5;
std::cout << i.get() << std::endl;
// asserts (not created yet)
lazy_object<double> d;
std::cout << d.get() << std::endl;
}
In your case, just create a member in your class: lazy_object<B>
and you're done. No manual releases or making copy-constructors, destructors, etc. Everything is taken care of in your nice, small re-usable class. :)
Removed the need for vector, should save a bit of space and what-not.
This uses aligned_storage
and alignment_of
to use the stack instead of heap. I used boost, but this functionality exists in both TR1 and C++0x. We lose the ability to copy, and therefore swap.
#include <boost/type_traits/aligned_storage.hpp>
#include <cassert>
#include <new>
template <typename T>
class lazy_object_stack
{
public:
// types
typedef T value_type;
typedef const T const_value_type;
typedef value_type& reference;
typedef const_value_type& const_reference;
typedef value_type* pointer;
typedef const_value_type* const_pointer;
// creation
lazy_object_stack(void) :
mObject(0)
{
}
~lazy_object_stack(void)
{
destroy();
}
// need to make multiple versions of this.
// variadic templates/Boost.PreProccesor
// would help immensely. For now, I give
// two, but it's easy to make more.
void create(void)
{
destroy();
mObject = new (buffer()) T();
}
template <typename A1>
void create(const A1 pA1)
{
destroy();
mObject = new (buffer()) T(pA1);
}
void destroy(void)
{
if (exists())
{
mObject->~T();
mObject = 0;
}
}
// access
reference get(void)
{
return *get_ptr();
}
const_reference get(void) const
{
return *get_ptr();
}
pointer get_ptr(void)
{
assert(exists());
return mObject;
}
const_pointer get_ptr(void) const
{
assert(exists());
return mObject;
}
void* buffer(void)
{
return mBuffer.address();
}
// query
const bool exists(void) const
{
return mObject != 0;
}
private:
// types
typedef boost::aligned_storage<sizeof(T),
boost::alignment_of<T>::value> storage_type;
// members
pointer mObject;
storage_type mBuffer;
// non-copyable
lazy_object_stack(const lazy_object_stack& pRhs);
lazy_object_stack& operator=(lazy_object_stack pRhs);
};
// test use
#include <iostream>
int main(void)
{
// basic usage
lazy_object_stack<int> i;
i.create();
i.get() = 5;
std::cout << i.get() << std::endl;
// asserts (not created yet)
lazy_object_stack<double> d;
std::cout << d.get() << std::endl;
}
And there we go.
Upvotes: 8
Reputation: 279215
A quick test of Martin York's assertion that this is a premature optimisation, and that new/delete are optimised well beyond the ability of mere programmers to improve. Obviously the questioner will have to time his own code to see whether avoiding new/delete helps him, but it seems to me that for certain classes and uses it will make a big difference:
#include <iostream>
#include <vector>
int g_construct = 0;
int g_destruct = 0;
struct A {
std::vector<int> vec;
A (int a, int b) : vec((a*b) % 2) { ++g_construct; }
~A() {
++g_destruct;
}
};
int main() {
const int times = 10*1000*1000;
#if DYNAMIC
std::cout << "dynamic\n";
A *x = new A(1,3);
for (int i = 0; i < times; ++i) {
delete x;
x = new A(i,3);
}
#else
std::cout << "automatic\n";
char x[sizeof(A)];
A* yzz = new (x) A(1,3);
for (int i = 0; i < times; ++i) {
yzz->~A();
new (x) A(i,3);
}
#endif
std::cout << g_construct << " constructors and " << g_destruct << " destructors\n";
}
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m7.718s
user 0m7.671s
sys 0m0.030s
$ g++ allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m15.188s
user 0m15.077s
sys 0m0.047s
This is roughly what I expected: the GMan-style (destruct/placement new) code takes twice as long, and is presumably doing twice as much allocation. If the vector member of A is replaced with an int, then the GMan-style code takes a fraction of a second. That's GCC 3.
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=1 -g && time ./allocperf
dynamic
10000001 constructors and 10000000 destructors
real 0m5.969s
user 0m5.905s
sys 0m0.030s
$ g++-4 allocperf.cpp -oallocperf -O3 -DDYNAMIC=0 -g && time ./allocperf
automatic
10000001 constructors and 10000000 destructors
real 0m2.047s
user 0m1.983s
sys 0m0.000s
This I'm not so sure about, though: now the delete/new takes three times as long as the destruct/placement new version.
[Edit: I think I've figured it out - GCC 4 is faster on the 0-sized vectors, in effect subtracting a constant time from both versions of the code. Changing (a*b)%2
to (a*b)%2+1
restores the 2:1 time ratio, with 3.7s vs 7.5]
Note that I've not taken any special steps to correctly align the stack array, but printing the address shows it's 16-aligned.
Also, -g doesn't affect the timings. I left it in accidentally after I was looking at the objdump to check that -O3 hadn't completely removed the loop. That pointers called yzz because searching for "y" didn't go quite as well as I'd hoped. But I've just re-run without it.
Upvotes: 3
Reputation: 86313
Like the others have already suggested: Try placement new..
Here is a complete example:
#include <new>
#include <stdio.h>
class B
{
public:
int dummy;
B (int arg)
{
dummy = arg;
printf ("C'Tor called\n");
}
~B ()
{
printf ("D'tor called\n");
}
};
void called_often (B * arg)
{
// call D'tor without freeing memory:
arg->~B();
// call C'tor without allocating memory:
arg = new(arg) B(10);
}
int main (int argc, char **args)
{
B test(1);
called_often (&test);
}
Upvotes: 1
Reputation: 85957
Erm, is there some reason you can't do this?
A() : b(new B()) { }
void calledVeryOften(…)
{
b->setValues(param1, param2, param3, param4);
}
(or set them individually, since you don't have access to the B
class - those values do have mutator-methods, right?)
Upvotes: 0
Reputation: 84151
I'd go with boost::scoped_ptr here:
class A: boost::noncopyable
{
typedef boost::scoped_ptr<B> b_ptr;
b_ptr pb_;
public:
A() : pb_() {}
void calledVeryOften( /*…*/ )
{
pb_.reset( new B( params )); // old instance deallocated
// safely use *pb_ as reference to instance of B
}
};
No need for hand-crafted destructor, A
is non-copyable, as it should be in your original code, not to leak memory on copy/assignment.
I'd suggest to re-think the design though if you need to re-allocate some inner state object very often. Look into Flyweight and State patterns.
Upvotes: 0
Reputation: 69672
Simply reserve the memory required for b (via a pool or by hand) and reuse it each time you delete/new instead of reallocating each time.
Example :
class A
{
B* b; // an A object owns a B object
bool initialized;
public:
A() : b( malloc( sizeof(B) ) ), initialized(false) { } // We reserve memory for b
~A() { if(initialized) destroy(); free(b); } // release memory only once we don't use it anymore
void calledVeryOften(…)
{
if (initialized)
destroy();
create();
}
private:
void destroy() { b->~B(); initialized = false; } // hand call to the destructor
void create( param1, param2, param3, param4 )
{
b = new (b) B( param1, param2, param3, param4 ); // in place new : only construct, don't allocate but use the memory that the provided pointer point to
initialized = true;
}
};
In some cases a Pool or ObjectPool could be a better implementation of the same idea.
The construction/destruction cost will then only be dependante on the constructor and destructor of the B class.
Upvotes: 8
Reputation: 791371
If B
correctly implements its copy assignment operator then b = B(...)
should not call any destructor on b
. It is the most obvious solution to your problem.
If, however, B
cannot be appropriately 'default' initialized you could do something like this. I would only recommend this approach as a last resort as it is very hard to get safe. Untested, and very probably with corner case exception bugs:
// Used to clean up raw memory of construction of B fails
struct PlacementHelper
{
PlacementHelper() : placement(NULL)
{
}
~PlacementHelper()
{
operator delete(placement);
}
void* placement;
};
void calledVeryOften(....)
{
PlacementHelper hp;
if (b == NULL)
{
hp.placement = operator new(sizeof(B));
}
else
{
hp.placement = b;
b->~B();
b = NULL; // We can't let b be non-null but point at an invalid B
}
// If construction throws, hp will clean up the raw memory
b = new (placement) B(param1, param2, param3, param4);
// Stop hp from cleaning up; b points at a valid object
hp.placement = NULL;
}
Upvotes: 3
Reputation: 4723
Are you sure that memory allocation is the bottleneck you think it is? Is B's constructor trivially fast?
If memory allocation is the real problem, then placement new or some of the other solutions here might well help.
If the types and ranges of the param[1..4] are reasonable, and the B constructor "heavy", you might also consider using a cached set of B. This presumes you are actually allowed to have more than one at a time, that it does not front a resource for example.
Upvotes: 1
Reputation: 57525
How about allocating the memory for B once (or for it's biggest possible variant) and using placement new?
A would store char memB[sizeof(BiggestB)];
and a B*
. Sure, you'd need to manually call the destructors, but no memory would be allocated/deallocated.
void* p = memB;
B* b = new(p) SomeB();
...
b->~B(); // explicit destructor call when needed.
Upvotes: 5