Jeremy
Jeremy

Reputation: 41

c++ overhead from string concatenation

I'm reading in a text file of random ascii from an ifstream. I need to be able to put the whole message into a string type for character parsing. My current solution works, but I think i'm murdering process time on the more lengthy files by using the equivalent of this:

std::string result;

for (std::string line; std::getline(std::cin, line); )
{
    result += line;
}

I'm concerned about the overhead associated with concatenating strings like this (this is happening a few thousand times, with a message 10's of thousands of characters long). I've spent the last few days browsing different potential solutions, but nothing is quite fitting... I don't know the length of the message ahead of time, so I don't think using a dynamically sized character array is my answer.

I read through this SO thread which sounded almost applicable but still left me unsure;

Any suggestions?

Upvotes: 4

Views: 341

Answers (4)

Qortex
Qortex

Reputation: 7466

The problem really is that you don't know the full size ahead of time, so you cannot allocate memory appropriately. I would expect that the performance hit you get is related to that, not to the way strings are concatenated since it is efficiently done in the standard library.

Thus, I would recommend deferring concatenation until you know the full size of your final string. That is, you start by storing all your strings in a big vector as in:

using namespace std;
vector<string> allLines;
size_t totalSize = 0;
// If you can have access to the total size of the data you want
// to read (size of the input file, ...) then just initialize totalSize
// and use only the second code snippet below.
for (string line; getline(cin, line); )
{
    allLines.push_back(line);
    totalSize += line.size();
}

Then, you can create your big string knowing its size in advance:

string finalString;
finalString.reserve(totalSize);
for (vector<string>::iterator itS = allLines.begin(); itS != allLines.end(); ++itS)
{
    finalString += *itS;
}

Although, I should mention that you should do that only if you experience performance issues. Don't try to optimize things that do not need to, otherwise you will complicate your program with no noticeable benefit. Places where we need to optimize are often counterintuitive and can vary from environment to environment. So do that only if your profiling tool tells you you need to.

Upvotes: 1

Aaron
Aaron

Reputation: 9193

You're copying the result array for every line in the file (as you expand result). Instead pre-allocate the result and grow it exponentially:

std::string result;
result.reserve(1024); // pre-allocate a typical size

for (std::string line; std::getline(std::cin, line); )
{
    // every time we run out of space, double the available space
    while(result.capacity() < result.length() + line.length())
        result.reserve(result.capacity() * 2);

    result += line;
}

Upvotes: 0

Lightness Races in Orbit
Lightness Races in Orbit

Reputation: 385174

I'm too sleepy to put together any solid data for you but, ultimately, without knowing the size ahead of time you're always going to have to do something like this. And the truth is that your standard library implementation is smart enough to handle string resizing fairly smartly. (That's despite the fact that there's no exponential growth guarantee for std::string, the way that there is for std::vector.)

So although you may see unwanted re-allocations the first fifty or so iterations, after a while, the re-allocated block becomes so large that re-allocations become rare.

If you profile and find that this is still a bottleneck, perhaps use std::string::reserve yourself with a typical quantity.

Upvotes: 0

dan
dan

Reputation: 1002

If you know the file size, use result's member function 'reserve()' once.

Upvotes: 0

Related Questions