khstex
khstex

Reputation: 21

Efficiency of 2 vectors versus a vector of structs

I'm working on a C++ project where I need to search through a vector ignoring those that have already been visited. If one has been visited I set its corresponding visited to 1 and ignore it. Which solution is faster?

Solution 1:

vector<string> stringsToVisit;
vector<int> stringVisited;

for (int i = 0; i < stringToVisit.size(); ++i) {
    if (stringVisited[i] == 0) {
        string current = stringsToVisit[i];
        ...Do Stuff...
        stringVisited[i] = 1;
    }
}

or

Solution 2:

struct StringInfo {
    string myString;
    int visited = 0;
}

vector<StringInfo> stringsToVisit;

for (int i = 0; i < stringsToVisit.size(); ++i) {
    if (stringsToVisit[i].visited == 0) {
        string current = stringsToVisit[i].myString;
        ...Do Stuff...
        stringsToVisit[i].visited = 1;
    }
}

Upvotes: 1

Views: 195

Answers (2)

bcrist
bcrist

Reputation: 1530

As Bernard notes, the time and memory complexity of both proposed solutions is identical, and the slightly more complex addressing required by the second solution isn't going to slow things down on modern processors. But I disagree with his suggestion that "Solution 2 is likely to be faster." We really don't know enough to even say that it should theoretically be faster, and except perhaps in a few degenerate situations, the difference in actual performance is likely to be unmeasurable.

The first iteration of the loop would indeed likely be slower. The cache is cold, and the first solution requires two cache lines to store the first elements, while the second solution only requires one. But after that, both solutions are doing a forward linear traversal. The CPU is going to have no problem prefetching additional cache lines, so in most situations, that initial extra overhead is unlikely to actually matter too much.

On the other hand, you are writing data while in this loop, so some of the cache lines you access also get marked dirty (meaning their data needs to be written back to a shared cache or main memory eventually, and they get purged from any other cores' caches). In solution 1, depending on sizeof(string) and sizeof(int), only 5-25% of the cache lines are getting marked dirty. Solution 2, however, dirties every single one, so it may actually use more memory bandwidth.

So, some things that might make solution 2 faster are:

  • The list of strings being processed is extremely short
  • ...Do Stuff... is very complicated (enough so that the cache lines holding the data get purged from L1 cache)

Some things that might make solution 1 equivalent or faster than solution 2:

  • The list of strings being processed is moderate to large
  • ...Do Stuff... is not very complex, so the cache stays warm
  • The program is multithreaded, and another thread wants to read data from stringsToVisit at the same time.

The bottom line is, it probably doesn't matter.

Upvotes: 2

Bernard
Bernard

Reputation: 5666

First of all, you should profile your code to check if this piece of code is really the bottleneck, and accurately measure the amount of time each solution takes to run. That would give the best results.

Nevertheless, here's my answer:

The time complexity of both solutions are O(n), so we are talking only about constant-factor optimizations here.

Solution 1 requires the lookup of two different memory blocks - stringsToVisit[i] and stringVisited[i] in each loop. This isn't good for CPU caches, as compared to Solution 2, each iteration of the loop accesses a single struct stored in contiguous locations in memory. As such, Solution 2 would perform better.

Solution 2 would need a more complicated indirect memory lookup than Solution 1 to access the visited property of the struct: (base address of stringsToVisit) + (index) * (struct size) + (displacement in struct). Nevertheless, this kind of lookup fits well into most processors' SIB (scale-index-base) addressing, so it will compile to one assembly instruction only, so there would not be much slowness, if any at all. It is worth noting that an optimizing compiler might notice that you're accessing memory sequentially and do optimizations to avoid using SIB addressing totally.

Hence, Solution 2 is likely to be faster.

Upvotes: 0

Related Questions