Tyson
Tyson

Reputation: 1238

modifying values in pointers is very slow?

I'm working with a huge amount of data stored in an array, and am trying to optimize the amount of time it takes to access and modify it. I'm using Window, c++ and VS2015 (Release mode).

I ran some tests and don't really understand the results I'm getting, so I would love some help optimizing my code.

First, let's say I have the following class:

class foo
{
public:
    int x;

    foo() 
    {
        x = 0;
    }

    void inc()
    {
        x++;
    }

    int X()
    {
        return x;
    }

    void addX(int &_x)
    {
        _x++;
    }

};

I start by initializing 10 million pointers to instances of that class into a std::vector of the same size.

#include <vector>
int count = 10000000;
std::vector<foo*> fooArr;
fooArr.resize(count);
for (int i = 0; i < count; i++)
{
     fooArr[i] = new foo();
}

When I run the following code, and profile the amount of time it takes to complete, it takes approximately 350ms (which, for my purposes, is far too slow):

for (int i = 0; i < count; i++)
{
    fooArr[i]->inc(); //increment all elements
}

To test how long it takes to increment an integer that many times, I tried:

int x = 0;
for (int i = 0; i < count; i++)
{
    x++;
}

Which returns in <1ms.

I thought maybe the number of integers being changed was the problem, but the following code still takes 250ms, so I don't think it's that:

for (int i = 0; i < count; i++)
{
    fooArr[0]->inc(); //only increment first element
}

I thought maybe the array index access itself was the bottleneck, but the following code takes <1ms to complete:

int x;
for (int i = 0; i < count; i++)
{
    x = fooArr[i]->X(); //set x
}

I thought maybe the compiler was doing some hidden optimizations on the loop itself for the last example (since the value of x will be the same during each iteration of the loop, so maybe the compiler skips unnecessary iterations?). So I tried the following, and it takes 350ms to complete:

int x;
for (int i = 0; i < count; i++)
{
    fooArr[i]->addX(x); //increment x inside foo function
}

So that one was slow again, but maybe only because I'm incrementing an integer with a pointer again.

I tried the following too, and it returns in 350ms as well:

for (int i = 0; i < count; i++)
{
    fooArr[i]->x++;
}

So am I stuck here? Is ~350ms the absolute fastest that I can increment an integer, inside of 10million pointers in a vector? Or am I missing some obvious thing? I experimented with multithreading (giving each thread a different chunk of the array to increment) and that actually took longer once I started using enough threads. Maybe that was due to some other obvious thing I'm missing, so for now I'd like to stay away from multithreading to keep things simple.

I'm open to trying containers other than a vector too, if it speeds things up, but whatever container I end up using, I need to be able to easily resize it, remove elements, etc.

I'm fairly new to c++ so any help would be appreciated!

Upvotes: 0

Views: 362

Answers (2)

FranMowinckel
FranMowinckel

Reputation: 4343

Try the following:

int count = 10000000;
std::vector<foo> fooArr;
fooArr.resize(count, foo());

for (auto it= fooArr.begin(); it != fooArr.end(); ++it) {
   it->inc();
} 

The new is killing you and actually you don't need it because resize inserts elements at the end if the size it's greater (check the docs: std::vector::resize)

And the other thing it's about using pointers which IMHO should be avoided until the last moment and it's uneccesary in this case. The performance should be a little bit faster in this case since you get better locality of your references (see cache locality). If they were polymorphic or something more complicated it might be different.

Upvotes: 0

Ripi2
Ripi2

Reputation: 7198

Let's look from the CPU point of view.

Incrementing an integer means I have it in a CPU register and just increments it. This is the fastest option.

I'm given an address (vector->member) and I must copy it to a register, increment, and copy the result back to the address. Worst: My CPU cache is filled with vector pointers, not with vector-member pointers. Too few hits, too much cache "refueling".

If I could manage to have all those members just in a vector, CPU cache hits would be much more frequent.

Upvotes: 1

Related Questions