Reputation: 2086
I was dealing with some performance issues which I discussed in this question: Super Slow C++ For Loop
I have a simple program I wrote to parse binary data. I tested it locally on 2 computers.
1. Dual 6 core 2.4GHz Xeon V3, 64GB RAM, NVMe SSD
2. Dual 4 core 3.5GHz Xeon V3, 64GB RAM, NVMe SSD
Here is some of the code(rest is on Wandbox https://wandbox.org/permlink/VIvardJNAMKzSbMf):
string HexRow="";
for (int i=b; i<HexLineLength+b;i++){
HexRow+= incomingData[i];
}
std::vector<unsigned char> BufferedLine=HexToBytes(HexRow);
stopwatch<> sw;
for (int i = 0; 80 >= i; ++i)
{
Byte ColumnBytes;
for (auto it = columns["data"][i].begin(); it != columns["data"][i].end(); ++it)
{
try {
if (it.key() == "Column") { ColumnBytes.Column = it.value().get<std::string>();}
else if (it.key() == "DataType") { ColumnBytes.DataType = it.value().get<std::string>();}
else if (it.key() == "StartingPosition") { ColumnBytes.StartingPosition = it.value().get<int>();}
else if (it.key() == "ColumnWidth") { ColumnBytes.ColumnWidth = it.value().get<int>();}
}
catch (...) {}
}
char* locale = setlocale(LC_ALL, "UTF-8");
std::vector<unsigned char> CurrentColumnBytes(ColumnBytes.ColumnWidth);
int arraySize = CurrentColumnBytes.size();
for (int C = ColumnBytes.StartingPosition; C < ColumnBytes.ColumnWidth + ColumnBytes.StartingPosition; ++C)
{
int Index = C - ColumnBytes.StartingPosition;
CurrentColumnBytes[Index] = BufferedLine[C-1];
}
}
std::cout << "Elapsed: " << duration_cast<double>(sw.elapsed()) << '\n';
Compiling on PC 1 with Visual Studio using the following flags:
/O2 /JMC /permissive- /MP /GS /analyze- /W3 /Zc:wchar_t /ZI /Gm- /sdl /Zc:inline /fp:precise /D "_CRT_SECURE_NO_WARNINGS" /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /Gd /Oy- /MDd /std:c++17 /FC /Fa"Debug\" /EHsc /nologo /Fo"Debug\" /Fp"Debug\Project1.pch" /diagnostics:column
Output:
Elapsed: 0.0913771
Elapsed: 0.0419886
Elapsed: 0.042406
Using Clang with the following: clang main.cpp -O3
outputs:
Elapsed: 0.036262
Elapsed: 0.0174264
Elapsed: 0.0170038
Compiling with GCC from MinGW gcc version 8.1.0 (i686-posix-dwarf-rev0, Built by MinGW-W64 project)
using these switches gcc main.cpp -lstdc++ -O3
gives the following time:
Elapsed: 0.019841
Elapsed: 0.0099643
Elapsed: 0.0094552
I get with Visual Studio, still with the /O2
Elapsed: 0.054841
Elapsed: 0.03543
Elapsed: 0.034552
I didn't do Clang and GCC on PC 2, but the improvement wasn't significant enough to resolve my concerns.
The issue is that the exact same code on Wandbox (https://wandbox.org/permlink/VIvardJNAMKzSbMf) executes 10-80 times faster
Elapsed: 0.00115457
Elapsed: 0.000815412
Elapsed: 0.000814636
Wandbox is using GCC 10.0.0 and c++14. I realize it is likely running on linux, and I couldn't find any way to get GCC 10 to compile on Windows, so I can't test compiling with that version.
This is a rewrite of a C# application I wrote, which operates so much faster:
Elapsed: 0.017424
Elapsed: 0.0006065
Elapsed: 0.000733
Elapsed: 0.0006166
Elapsed: 0.0004699
Finished Parsing: 100 Records. Elapsed :0.0082796 at a rate of : 12076/s
The C# Method looks like this:
Stopwatch sw = new Stopwatch();
sw.Start();
foreach (dynamic item in TableData.data) //TableData is a JSON file with the structure definition
{
string DataType = item.DataType;
int startingPosition = item.StartingPosition;
int width = Convert.ToInt32(item.ColumnWidth);
if (width+startingPosition >= FullLineLength)
{
continue;
}
byte[] currentColumnBytes = currentLineBytes.Skip(startingPosition).Take(width).ToArray();
// ..... 200 extra lines of processing into ints, dates, strings ......
// ..... Even with the extra work, it operates at 1200+ records per second ......
}
sw.Stop();
var seconds = sw.Elapsed.TotalSeconds;
sw.Reset();
Console.WriteLine("Elapsed: " + seconds);
TempTable.Rows.Add(dataRow);
When I started this, I expected huge performance gains by moving code to unmanaged C++ from C#. This is my first C++ project and I am frankly just a bit discouraged about where I am. What can be done to speed up this C++? Do I need to use different datatypes, malloc
, more / less structs?
It needs to run on Windows, not sure if there is a way to get GCC 10 to work on Windows?
What suggestions do you have for an aspiring C++ Developer?
Upvotes: 0
Views: 333
Reputation: 2086
Ok, so I was able to get C++ processing the file at around 50,000 rows per second with 80 columns per row. I reworked the entire workflow to make sure it didn't have to backtrack at all. I first read the entire file into ByteArray
and then would go over it line by line by moving data from one array to another rather than specifying each byte in a for
loop. I then used a map
to store the data.
stopwatch<> sw;
while (CurrentLine < TotalLines)
{
int BufferOffset = CurrentLine * LineLength;
std::move(ByteArray + BufferOffset, ByteArray + BufferOffset + LineLength, LineByteArray);
for (int i = 0; TotalColumns > i + 1; ++i)
{
int ThisStartingPosition = StartingPosition[i];
int ThisWidth = ColumnWidths[i];
std::uint8_t* CurrentColumnBytes;
CurrentColumnBytes = new uint8_t[ThisWidth];
{
std::move(LineByteArray + ThisStartingPosition, LineByteArray + ThisStartingPosition + ThisWidth, CurrentColumnBytes);
ResultMap[CurrentLine][i] = Format(CurrentColumnBytes, ThisWidth, DataType[i]);
}
}
CurrentLine++;
}
std::cout << "Processed" << CurrentLine << " lines in : " << duration_cast<double>(sw.elapsed()) << '\n';
I still am a little disappointed because using the Boost Gregorian calendar conversion is unavailable using Clang to compile, and using the standard MS compiler makes it nearly 20X slower. With Clang -O3
it was processing 10,700 records in 0.25 seconds including all int
and string
conversions. I will just have to write my own date
conversion.
Upvotes: 0
Reputation: 785
It really depends on the commands being executed in assembler/machine code. VS has never been great at C++ and for many years Borland kicked their arses for both efficiency & reliability. Then Borland sold their IDE & C++ branch as a separate company.
It also depends on how you have programmed the process to occur in C++, can you please edit to show that code?
The advantage of C# is that it is managed and may use higher level of interpretations of your code, so in the background it may JIT the code into converting the whole line to the parsed format, then the for loop breaks the chunks off (1 step looped), wheras if you write it in C++ it will follow your commands more accurately even if they are less efficient, ie: it breaks off the chunk you are looking at, then converts that to the parsed format (2 steps looped).
So using the above example if we assume the 2 commands together are 50% slower than the 2 commands in C++, but the 2 commands are being processed on every loop, where the c# code only processeds the 1 command on every loop, any inefficiency will be compounded.
ALSO +1 to doug in comments above, reference vs value can make a pretty big difference especially when you are dealing with large datasets. I think his answer is the most likely for large differences.
Simplification is the answer I believe:
std::string byteString = hex.substr(i, 2);
unsigned char byte = (unsigned char) strtol(byteString.c_str(), NULL, 16);
Could become
unsigned char byte = (unsigned char) strtol(hex.substr(i, 2).c_str(), NULL, 16);
and remove a minor memory assignment. But again, if you can convert the entire source to a byte stream, then use the for loop on that, you remove the conversion step from the loop.
Upvotes: 1