Dollarslice
Dollarslice

Reputation: 10284

Why is this so much slower in C++?

I have converted this simple method from C# to C++. It reads a path table and populates a list of lists of ints (or a vector of vectors of ints).

A sample line from the path table would be something like

0 12 5 16 n

I realise there are better ways of doing this in general, but for now I just want to know why my C++ code is taking so much longer. e.g. 10 minutes as opposed to 10 seconds with the C# version. Here is my C++ code. I'm guessing I've done something a bit drastically wrong.

//Parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
    // Read the file line by line.
    ifstream myFile(pathTable);

        for (unsigned int i = 0; i < nodes.size(); i++)
        {
            pathLookupVectors.push_back(vector<vector<int>>());

            for (unsigned int j = 0; j < nodes.size(); j++)
            {
                string line;

                if (getline(myFile, line)) //Enter if a line is read successfully
                {
                    stringstream ss(line);
                    istream_iterator<int> begin(ss), end;
                    pathLookupVectors[i].push_back(vector<int>(begin, end));
                }
            }
        }
    myFile.close();
}

Here is the C# version:

private void PopulatePathLists(string pathList)
{
    // Read the file and display it line by line.
    StreamReader streamReader = new StreamReader(pathList);

    for (int i = 0; i < nodes.Count; i++)
    {
        pathLookupLists.Add(new List<List<int>>());

        for (int j = 0; j < nodes.Count; j++)
        {
            string str = streamReader.ReadLine();
            pathLookupLists[i].Add(new List<int>());

            //For every string (list of ints) - put each one into these lists
            int count = 0;
            string tempString = "";

            while (str[count].ToString() != "n") //While character does not equal null terminator
            {
                if (str[count].ToString() == " ") //Character equals space, set the temp string 
                                                  //as the node index, and move on
                {
                    pathLookupLists[i][j].Add(Convert.ToInt32(tempString));
                    tempString = "";
                }
                else //If characters are adjacent, put them together
                {
                    tempString = tempString + str[count];
                }
                count++;
            }
        }
    }
    streamReader.Close();
}

Sorry this is so specific, but I'm stumped.

EDIT - A lot of people have said they have tested this code, and it takes mere seconds for them. All I know is, if I comment out the call to this function, the program loads in seconds. With the function call it takes 5 minutes. Almost exactly. I'm really stumped. What could the problem be?

Here is the PathTable it's using.

EDIT - I tried running the function in a program on its own, and it took a few seconds, but I'm afraid I don't know enough to be able to know how to fix this problem. Obviously it's not the code. What could it be? I checked where it's being called to see if there were multiple calls, but there aren't. It's in a constructor of the game's level and that is only called once.

EDIT - I understand that the code is not the best it could be, but that isn't the point here. It runs quickly on its own - about 3 seconds and that's fine for me. The problem I'm trying to solve is why it takes so much longer inside the project.

EDIT - I commented out all of the game code apart from the main game loop. I placed the method into the initialize section of the code which is run once on start up. Apart from a few methods setting up a window it's now pretty much the same as the program with ONLY the method in, only it STILL takes about 5 minutes to run. Now I know it has nothing to do with dependencies on the pathLookupVectors. Also, I know it's not a memory thing where the computer starts writing to the hard drive because while the slow program is chugging away running the method, I can open another instance of Visual Studio and run the single method program at the same time which completes in seconds. I realise that the problem might be some basic settings, but I'm not experienced so apologies if this does disappointingly end up being the reason why. I still don't have a clue why it's taking so much longer.

Upvotes: 11

Views: 1867

Answers (9)

Timo
Timo

Reputation: 5176

I profiled the code with Very Sleepy (Visual C++ 2010, 32-bit Windows XP). I don't know how similar my input data was, but here are the results anyway:

39% of the time was spent in basic_istream::operator>>

12% basic_iostream::basic_iostream

9% operator+

8% _Mutex::Mutex

5% getline

5% basic_stringbuf::_Init

4% locale::_Locimp::_Addfac

4% vector::reserve

4% basic_string::assign

3% operator delete

2% basic_Streambuf::basic_streambuf

1% Wcsxfrm

5% other functions

Some of the stuff seems to be from inlined calls so it's a bit difficult to say where it actually comes from. But you can still get the idea. The only thing that should do I/O here is getline and that takes only 5%. The rest is overhead from stream and string operations. C++ streams are slow as hell.

Upvotes: 9

Kleist
Kleist

Reputation: 7985

Since your function itself is not slow1, the reason for the program being slow must be that some code that uses the product of this function becomes slower when pathLookupVectors has been populated.

I think running a profiler on your program would be the best way to find out, but you could also look through your code and find every piece of code that depends on pathLookupVectors and consider if it could be the bottleneck you're searching for.

1. Established in your latest edit.

Upvotes: 1

Daniel Mošmondor
Daniel Mošmondor

Reputation: 19956

If you have extremely large number of elements, you'll be punished with re-allocation and copy every time vector is pushed-back. Try using a different container in C++.

Upvotes: 1

Tom Kerr
Tom Kerr

Reputation: 10720

Here are a few things that I haven't seen anyone else mention. They are somewhat vague, but being unable to reproduce things makes it hard to go into specifics on all of it.

Poor man's profiling.

While the code is running, just keep interrupting it. Usually you'll see the same stack frame over and over.

Start commenting stuff out. If you comment out your splitting and it completes instantly, then its pretty clear where to start.

Some of the code is dependent, but you could read the full file into memory then do the parsing to create an obvious separation on where its spending its time. If both finish quickly independently, then it's probably interaction.

Buffering.

I don't seen any buffering on your reads. This becomes especially important if you are writing anything to disk. The arm on your disk will jump back and forth between your read location, then write location, etc.

While it doesn't look like you are writing here, your main program may have more memory being used. It is possible that after you reach your high water, the OS starts paging some of the memory to disk. You'll thrash when you are reading line by line while the paging is happening.

Usually, I'll set up a simple iterator interface to verify everything is working. Then write a decorator around it to read 500 lines at a time. The standard streams have some buffering options built in as well, and those may be better to use. I'm going to guess that their buffering defaults are pretty conservative.

Reserve.

std::vector::push_back works best when you also use std::vector::reserve. If you can make most of the memory is available before entering a tight loop, you win. You don't even have to know exactly how much, just guess.

You can actually beat std::vector::resize performance with this as well, because std::vector::resize uses alloc and std::vector::push_back will use realloc

That last bit is contested, though I've read otherwise. I have no reason to doubt that I'm wrong, though I will have to do more research to confirm or deny.

Nevertheless, push_back can run faster if you use reserve with it.

String splitting.

I've never seen a C++ iterator solution that was performant when it comes to dealing with gb+ files. I haven't tried that one specifically, though. My guess at why is that they tend to make a lot of small allocations.

Here is a reference with what I usually use.

Split array of chars into two arrays of chars

Advice on std::vector::reserve applies here.

I prefer boost::lexical_cast to stream implementations for maintenance concerns, though I can't say its more or less performant than stream implementations. I will say it is exceedingly rare to actually see correct error checking on stream usage.

STL shenanigans.

I'm intentionally vague on these, sorry. I usually write code that avoids the conditions, though I do remember some of the trials and tribulations that co-workers have told me about. Using STLPort avoids a good chunk of these entirely.

On some platforms, using stream operations have some weird thread safety enabled by default. So I've seen minor std::cout usage absolutely destroy an algorithm's performance. You don't have anything here, but if you had logging going on in another thread that could pose problems. I see a 8% _Mutex::Mutex in another comment, which may speak to its existence.

It's plausible that a degenerate STL implementation could even have the above issue with the lexical parsing stream operations.

There are odd performance characteristics on some of the containers. I don't I ever had problems with vector, but I really have no idea what istream_iterator uses internally. In the past, I've traced through an misbehaving algorithm to find a std::list::size call doing full traversal of the list with GCC, for instance. I don't know if newer versions are less inane.

The usual stupid SECURE_CRT stupidity should stupidly be taken care of. I wonder if this is what microsoft thinks we want to spend our time doing?

Upvotes: 4

Miguel Grinberg
Miguel Grinberg

Reputation: 67479

Based on your update it is pretty clear that the function you posted by itself is not causing the performance problem, so while there are many ways in which you can optimize it it seems that is not going to help.

I presume you can reproduce this performance problem every time you run your code, correct? Then I would like to suggest that you do the following tests:

  • if you are compiling your program in debug mode (i.e. no optimizations), then recompile for release (full optimizations, favoring speed) and see if that makes a difference.

  • To check if the extra time is spent on this suspected function you can add printf statements at the start and end of the function that include timestamps. If this is not a console app but a GUI app and printfs are not going anywhere, then write to a log file. If you are on Windows, you can alternatively use OutputDebugString and use a debugger to capture the printfs. If you are on Linux, you can write to the system log using syslog.

  • Use a source code profiler to determine where is all that time spent. If the difference between calling this function or not is several minutes, then a profiler will surely give a clue as to what is happening. If you are on Windows, then Very Sleepy is a good choice, and if you are on Linux you can use OProfile.

Update: So you say that a release build is fast. That likely means that the library functions that you use in this function have slow debug implementations. The STL is know to be that way.

I'm sure you need to debug other parts of your application and you don't want to wait all those minutes for this function to complete in debug mode. The solution to this problem is to build your project in release mode, but change the release configuration in the following way:

  • disable optimizations only for the files you want to debug (make sure optimizations remain enabled at least for the file that has the slow function). To disable optimizations on a file, select the file in the Solution Explorer, right click, select Properties, then go to Configuration Properties|C/C++/Optimization. Look at how all the items in that page are set for the Debug build, and copy all of those in your Release build. Repeat for all the files that you want to be available to the debugger.

  • enable debugging info (the pdb file) to be generated. To do this, select the Project at the top of the Solution Explorer, right click, select Properties. Then go to Configuration Properties|Linker|Debugging and copy all the settings from the Debug build into the Release build.

With the above changes you will be able to debug the parts of the release binary that were configured as above just like you do it in the debug build.

Once you are done debugging you will need to reset all those settings back, of course.

I hope this helps.

Upvotes: 7

Sarfaraz Nawaz
Sarfaraz Nawaz

Reputation: 361252

The whileloop in your code seems to be very messy and long, as it is doing things in a way which is not needed:

A simple and fast equivalent code would be this:

int result;
stringstream ss(line);
while ( ss >> result ) //reads all ints untill it encounters non-int
{
    pathLookupVectors[i][j].push_back(result);
}

In C++, such loop is idiomatic as well. Or instead of this manual loop, you could write use std::copy 1:

std::copy(std::istream_iterator<int>( ss ), 
          std::istream_iterator<int>(), 
          std::back_inserter(pathLookupVectors[i][j]));

1. It is taken from @David's comment.

Or even better if you do this, when you push_back the vector itself:

 if (getline(myFile, line)) //enter if a line is read successfully
 {
   stringstream ss(line);
   std::istream_iterator<int> begin(ss), end;
   pathLookupVectors[i].push_back(vector<int>(begin, end));
 }

Done!

Upvotes: 7

hamstergene
hamstergene

Reputation: 24429

Both List.Add and vector::push_back reallocate memory from time to time as the container grows. C++ vector stores subvectors by value, so all their data (which seem to be huge in your case) is copied again and again. In contrast, C# list stores sublists by reference, so sublists' data is not copied during reallocation.

Typical vector implementation doubles its capacity during reallocation. So if you have 1 million of lines, subvectors will be copied log(2,1000000) ≈ 10 times.

Move semantics introduced in C++11 is supposed to eliminate this effect. Until that, try vector< shared_ptr< vector<int> > >, list< vector<int> >, or, if you know future size in advance, use vector::reserve() to avoid reallocations.

Upvotes: 3

Donald Miner
Donald Miner

Reputation: 39883

I'm not exactly sure what is going on here, but I see a few ways in which you can optimize your code. If this doesn't get you there, then there might be something else going on.


How big are your strings? As you are passing them in your C++ version, you are making copies because you are "passing by value". Try passing it by constant reference:

void Level::PopulatePathVectors(const string &pathTable)

This passes the object by reference, meaning it is not making a copy. Then, it is customary to make it const to ensure that it is not getting modified in your function.


Use .append or += to extend tempString. I believe you are making a new string object, then replacing the old one with just +, while += and .append are going to modify the current one in place:

tempString.append(line[count]);

You can also tweak out a bit more performance by declaring your variables at the top and then reassigning into them. This will prevent them from getting recreated every time. For example, put string line; before your for-loop, because it's going to get overwritten anyways.

There are a few places you can do this, such as with tempString.

Upvotes: 4

Ilian
Ilian

Reputation: 5355

Haven't tested the code but how many ints does it typically load? Consider what happens when each of your vectors reaches its capacity. A vector grows inefficiently - O(n) I believe. C#'s List doesn't have this behaviour.

Consider using std::deque, std::list or some other container that has better growth behaviour. See this article for more info.

Upvotes: 1

Related Questions