Logan Yang
Logan Yang

Reputation: 2594

push_back objects into vector memory issue C++

Compare the two ways of initializing vector of objects here.

1.
    vector<Obj> someVector;
    Obj new_obj;
    someVector.push_back(new_obj);

2.
    vector<Obj*> ptrVector;
    Obj* objptr = new Obj();
    ptrVector.push_back(objptr);

The first one push_back actual object instead of the pointer of the object. Is vector push_back copying the value being pushed? My problem is, I have huge object and very long vectors, so I need to find a best way to save memory.

Upvotes: 2

Views: 4699

Answers (4)

Jerry Coffin
Jerry Coffin

Reputation: 490178

Under the circumstances, I'd consider a third possibility: use std::deque instead of std::vector.

This is kind of a halfway point between the two you've given. A vector<obj> allocates one huge block to hold all the instances of the objects in the vector. A vector<obj *> allocates one block of pointers, but each instance of the object in a block of its own. Therefore, you get N objects plus N pointers.

A deque will create a block of pointers and a number of blocks of objects -- but (at least normally) it'll put a number of objects (call it M) together into a single block, so you get a block of N/M pointers, and N/M of objects.

This avoids many of the shortcomings of either a vector of objects or a vector of pointers. Once you allocate a block of objects, you never have to reallocate or copy them. You do (or may) eventually have to reallocate the block of pointers, but it'll be smaller (by a factor of M) than the vector of pointers if you try to do it by hand.

One caveat: if you're using Microsoft's compiler/standard library, this may not work very well -- they have some strange logic (still present up through VS 2013 RC) that means if your object size is larger than 16, you'll get only one object per block -- i.e., equivalent to your vector<obj *> idea.

Upvotes: 0

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275500

Of the two above options, this third not included one is the most efficient:

std::vector<Obj> someVector;
someVector.reserve(preCalculatedSize);
for (int i = 0; i < preCalculatedSize; ++i)
  someVector.emplace_back();

emplace_back directly constructs the object into the memory that the vector arranges for it. If you reserve prior to use, you can avoid reallocation and moving.

However, if the objects truly are large, then the advantages of cache-coherency are less. So a vector of smart pointers makes sense. Thus the forth option:

std::vector< std::unique_ptr<Obj> > someVector;
std::unique_ptr<Obj> element( new Obj );
someVector.push_back( std::move(element) );

is probably best. Here, we represent the lifetime of the data and how it is accessed in the same structure with nearly zero overhead, preventing it from getting out of sync.

You have to explicitly std::move the std::unique_ptr around when you want to move it. If you need a raw pointer for whatever reason, .get() is how to access it. -> and * and explicit operator bool are all overridden, so you only really need to call .get() when you have an interface that expects a Obj*.

Both of these solutions require C++11. If you lack C++11, and the objects truly are large, then the "vector of pointers to data" is acceptable.

In any case, what you really should do is determine which matches your model best, check performance, and only if there is an actual performance problem do optimizations.

Upvotes: 2

Vaughn Cato
Vaughn Cato

Reputation: 64308

Using a vector<Obj> will take less memory to store if you can reserve the size ahead of time. vector<Obj *> will necessarily use more memory than vector<Obj> if the vector doesn't have to be reallocated, since you have the overhead of the pointers and the overhead of dynamic memory allocation. This overhead may be relatively small though if you only have a few large objects.

However, if you are very close to running out of memory, using vector<Obj> may cause a problem if you can't reserve the correct size ahead of time because you'll temporarily need extra storage when reallocating the vector.

Having a large vector of large objects may also cause an issue with memory fragmentation. If you can create the vector early in the execution of your program and reserve the size, this may not be an issue, but if the vector is created later, you might run into a problem due to memory holes on the heap.

Upvotes: 1

Chad
Chad

Reputation: 19032

If your Obj class doesn't require polymorphic behavior, then it is better to simply store the Obj types directly in the vector<Obj>.

If you store objects in vector<Obj*>, then you are assuming the responsibility of manually deallocating those objects when they are no longer needed. Better, in this case, to use vector<std::unique_ptr<Obj>> if possible, but again, only if polymorphic behavior is required.

The vector will store the Obj objects on the heap (by default, unless you override the allocator in the vector template). These objects will be stored in contiguous memory, which can also give you better cache locality, depending upon your use case.

The drawback to using vector<Obj> is that frequent insertion/removal from the vector may cause reallocation and copying of your Obj objects. However, that usually will not be the bottleneck in your application, and you should profile it if you feel like it is.

With C++11 move semantics, the implications of copying can be much reduced.

Upvotes: 1

Related Questions