Pandoro
Pandoro

Reputation: 1062

Accessing std::vector elements slows down with vector size

Today I ran into the problem that accessing the vector elements slowed down with the size of the vector. As it is not my code, I cannot post it, so please bear with me. I will try to describe it as detailed as possible.

The functionality of the code is as follows: 1. a Dataset class, takes a .txt file, which contains file names. These point to standard png images which need to be loaded. This is done by an Image<T> class. The images are loaded as Image<unsigned char> and pushed back into a std::Vector. 2. After the loading of the data has been done. I can access the vector in my dataset in order to work with it. So it looks something like this:

Dataset d;
d.init("filenames_list.txt"); //Loads the images
for(int i=0; i< d.getDatavector().size(); i++){
  Image<unsigned char> current = d.getDatavector()[i];
  //Do work on current image here.
}

Here getDatavector() will return a std::Vector<Image<unsigned char> >. The images hold three ints, for width, height and the number of channels and furthermore a Boost shared pointer that points to the interleaved data.

For small testruns, I have a list of files which contains about 150 images. Running the program with this works fine and speed measurements tell me that

Image<unsigned char> current = d.getDatavector()[i];

takes about 10ms to be completed. If however I want to work on my full dataset of 1500 images, the above line takes about 500ms to complete. I've tried to do many different things to fix it, but I am somewhat limited by the general structure of the code and by the memory. Because if I do the following:

const std::Vector<Image<unsigned char> > data = d.getDatavector();

before the loop, it runs very fast, but I soon run out of memory.

I know my problem description is somewhat vague, and I am not hoping for the exact solution, but I am hoping for some tips on where to look. I searched for similar problems, but people only seem concerned with the general speed of vectors versus arrays. My problem is though, that the speed degrades with the length of the vector! If someone has seen this kind of problem, any suggestions are very much welcome!

So far I have tried accessing the content using the std::vector::iterator or using (d.getDatavector().data()) as a pointer. Nothing seems to improve the speed of it.

Upvotes: 1

Views: 375

Answers (4)

Michael Shmalko
Michael Shmalko

Reputation: 710

As already mentioned, the root of the problem seems to be in that getDatavector() returns a full copy of the vector, and the solution would be to return a reference (or a pointer instead).
You also have a similar problem with Image<unsigned char> current = ... where a copy of an image is being made as well.
One solution to these problems would be to use a direct access to the image as:

Image<unsigned char>* getImage(int idx)
{
 if (idx < _myVector.size())
 {
   return &_myVector[idx].Image;
 }
 return NULL;
}

Edit: version returning reference

    Image<unsigned char>& getImage(int idx)
    {
     if (idx < _myVector.size())
     {
       return _myVector[idx].Image;
     }
     // throw exception here;
    }

Obviously this will not work if you must have a copy of each image.

Upvotes: 1

Praetorian
Praetorian

Reputation: 109289

What does the signature of getDataVector() look like? Is it

std::vector<Image<unsigned char>> getDataVector();

If so, the function is returning the vector by value, and every time you write d.getDatavector()[i] a copy of the vector is made, the i element is copied out of the vector and then the vector itself is destroyed.

If you can modify the Dataset class change the function to

std::vector<Image<unsigned char>> const& getDataVector();

Now copies won't be made every time the function is called.

If you're unable to modify the class, make a single copy before entering the loop, and then use the local variable within the loop.

It is impossible for the problem to be the indexing since std::vector's underlying data array is required to be contiguous and so accessing the ith element is as simple as adding i to the pointer marking the start address of the data array and dereferencing the result.

Upvotes: 5

&#214;&#246; Tiib
&#214;&#246; Tiib

Reputation: 11031

The reason is that you return vector by value in cycle.

Make your getDatavector() to return a std::Vector<Image<unsigned short> >& or std::Vector<Image<unsigned short> > const& not std::Vector<Image<unsigned short> >

Upvotes: 5

Marshall Clow
Marshall Clow

Reputation: 16700

Are you using C++11, or an earlier C++?

If an earlier C++11, and getDataVector returns a vector, then it may have to be copied. If you're using C++11, it can be moved into the return variable w/o copying

that could be the source of your slowdown.

Accessing an element of a vector is a constant-time operation.

Upvotes: 2

Related Questions