Reputation: 123
I'm curious which method is faster when accessing vectors.
For the sake of simplicity, lets say I have two objects: Player
and Ship
.
There's a vector of player pointers vector<Player*> players
and each player object contains a vector of ship pointers vector<Ship*> ships
, and then each ship has several functions that it can call, and so on.
In these situations, is it faster to access these functions directly? Or to create a temporary object pointer to access everything?
Is it faster to do this:
for (int i = 0; i < players.size(); i++)
{
for (int j = 0; j < players.at(i)->ships.size(); j++)
{
players.at(i)->ships.at(j)->update();
if (
(players.at(i)->ships.at(j)->get_x() > 0) &&
(players.at(i)->ships.at(j)->get_x() < screen_x) &&
(players.at(i)->ships.at(j)->get_y() > 0) &&
(players.at(i)->ships.at(j)->get_y() < screen_y)
)
{
players.at(i)->visible.push_back(j);
}
}
}
Or is it faster to create temporary pointers so that the vectors don't need to be continually accessed:
for (int i = 0; i < players.size(); i++)
{
Player* play = players.at(i);
for (int j = 0; j < play->ships.size(); j++)
{
Ship* ship = play->ships.at(j);
ship->update();
int ship_x = ship->get_x();
int ship_y = ship->get_y();
if (
(ship_x > 0) &&
(ship_x < screen_x) &&
(ship_y > 0) &&
(ship_y < screen_y)
)
{
play->visible.push_back(j);
}
}
}
I know the second is visually neater, but don't really know if it's necessarily faster.
Thoughts?
Upvotes: 0
Views: 254
Reputation: 123
Thanks for the info everyone. Since there wasn't a clear cut "option A is definitely faster than option B", I took your advice and did a bench test.
Here's some code I put together.
Basically, it creates 100 players. Each player has a vector of 100 ships. Each ship has a vector of 100 crew. (Once it ran, it consumed about 500MB of RAM).
I ran the test both unoptimized and optimized (the -O3 flag)
Test 1 was the chain pointers (ie. player->ship->crew->number, etc.) Test 2 was the same as Test 1, but I replaced all the .at() with operator[]. Test 3 was using temporary pointers to access everything.
I ran each test multiple times and averaged the results. Here's my results:
Unoptimized:
Test 1: 13000
Test 2: 5500
Test 3: 2800
Optimized:
Test 1: 1050
Test 2: 650
Test 3: 450
It shows that optimizing greatly increased speed in all cases. Either way though, optimized or unoptimized, .at() definitely slows things WAY down. Using the operator[] was significantly faster. But in the end, using a temporary pointer was the fastest in all cases.
#include <vector>
#include <ctime>
#include <iostream>
using namespace std;
class People
{
public:
vector<int> number;
};
class Ship
{
public:
Ship(int f);
vector<People*> crew;
int get_x();
int get_y();
private:
int x;
int y;
};
Ship::Ship(int f)
{
//Assign some nonsense for testing purposes
x = f * 50;
y = f * 75;
}
int Ship::get_x()
{
return x;
}
int Ship::get_y()
{
return y;
}
class Player
{
public:
vector<Ship*> ships;
};
int main(int argc, char *argv[])
{
vector<Player*> players;
int start, end;
unsigned int i, j, k, l;
//Create 100 players, each with 100 ships, and each ship with 100 crew.
for (i = 0; i < 100; i++)
{
Player* play = new Player;
players.push_back(play);
for (j = 0; j < 100; j++)
{
Ship* new_ship = new Ship(j);
play->ships.push_back(new_ship);
for (k = 0; k < 100; k++)
{
People* newbie = new People;
new_ship->crew.push_back(newbie);
for (l = 0; l < 100; l++)
{
newbie->number.push_back(0);
}
newbie->number.clear();
}
}
}
//Test 1
start = clock();
for (i = 0; i < players.size(); i++)
{
for (j = 0; j < players.at(i)->ships.size(); j++)
{
for (k = 0; k < players.at(i)->ships.at(j)->crew.size(); k++)
{
for (l = 0; l < 100; l++)
{
//Give each crew some number to hold on to.
players.at(i)->ships.at(j)->crew.at(k)->number.push_back(players.at(i)->ships.at(j)->get_x() * players.at(i)->ships.at(j)->get_y() + l);
}
//Clear the number list for the next test.
players.at(i)->ships.at(j)->crew.at(k)->number.clear();
}
}
}
end = clock();
cout << "Test 1: " << (end - start) << endl;
//Test 2
start = clock();
for (i = 0; i < players.size(); i++)
{
for (j = 0; j < players[i]->ships.size(); j++)
{
for (k = 0; k < players[i]->ships[j]->crew.size(); k++)
{
for (l = 0; l < 100; l++)
{
players[i]->ships[j]->crew[k]->number.push_back(players[i]->ships[j]->get_x() * players[i]->ships[j]->get_y() + l);
}
players[i]->ships[j]->crew[k]->number.clear();
}
}
}
end = clock();
cout << "Test 2: " << (end - start) << endl;
//Test 3
start = clock();
for (i = 0; i < players.size(); i++)
{
Player* temp_play = players.at(i);
for (j = 0; j < temp_play->ships.size(); j++)
{
Ship* temp_ship = temp_play->ships.at(j);
for (k = 0; k < temp_ship->crew.size(); k++)
{
People* temp_crew = temp_ship->crew.at(k);
for (l = 0; l < 100; l++)
{
temp_crew->number.push_back(temp_ship->get_x() * temp_ship->get_y() + l);
}
temp_crew->number.clear();
}
}
}
end = clock();
cout << "Test 3: " << (end - start) << endl;
return 0;
}
Upvotes: 0
Reputation: 490108
Seems to me that the emphasis on speed is misplaced. I think you should start by writing the code to be more readable:
auto is_visible = [=](Ship const &s) { return s.get_x() > 0 && s.get_x() < screen_x
&& s.get_y() > 0 && s.get_y() < screen_y;
};
for (auto & player : players)
std::copy_if(ships.begin(), ships.end(),
std::back_inserter(player.visible),
is_visible);
At least IMO, this is at least as safe as using at
for indexing, but probably at least as fast as using []
, and more readable than either one.
I should probably add one more point: visibility doesn't seem to depend on the player. At least from the way the code's written, all the players will have the same set of visible ships. If that's correct, you probably want to do something more like:
std::vector<Ship> visible;
std::copy_if(ships.begin(), ships.end(),
std::back_inserter(visible),
[=](Ship const &s) { return s.get_x() > 0 && s.get_x() < screen_x
&& s.get_y() > 0 && s.get_y() < screen_y; });
for (auto &player : players)
player.visible = visible;
Upvotes: 2
Reputation: 145
I think you're at the mercy of your optimizing compiler here. Either one might be faster, depending on how it gets optimized.
In the first version, it's possible that the compiler will decide to
pull out the players.at(i)->ships.at(j)
common subexpression,
possibly with the get_x()
or get_y()
turning it into something
that looks a lot like your second version.
In the second version, it's possible that reordering could move the
int ship_y = ship->get_y()
into the loop conditional so that it can
short circuit with ship_y > 0
.
In both, it might decide to turn the entire short circuit conditional into a sequence of fast bitwise and instructions, eliminating branches
But my guess is that you're not going to see much difference either way. Try dumping the assembly code to compare, and of course, profile it.
Upvotes: 1
Reputation: 39013
You should check and see which one is faster.
It might be the first or it might be the second. It will definitely be the first if most ships' X coordinate is negative.
However, if the second one looks better to you (it does to me, too), stick with it. Worry about performance when there's an actual performance issue.
Upvotes: 1