Reputation: 35
I have a 1000-lines file of about 400MB representing some numeric data represented as string. I want to transpose the data in order to have only 1000 strings per line, (so that I can open it and plot it fast with pandas).
I imported the whole file in a vector of vector of string that I want to transpose (and eventually I want to write back to file).
I use two nested loops to go through the 2d structure and I write it into some std::ofstream. It is very long. Then I tried to focus on the transposition and I wrote the following code :
//Read 400MB file, 90K strings per line and 1K lines, and store it into
std::vector<std::vector<std::string>> mData;
// ...
// IO the file and populate mData with raw data
// ...
//All rows have same number of string
size_t nbRows = mData.size();
size_t nbCols = mData[0].size();
std::vector<std::vector<std::string> > transposedData(nbCols);
for(size_t i = 0 ; i < nbCols ; ++i)
{
transposedData[i].resize(nbRows);
for(size_t j = 0 ; j < nbRows ; ++j)
{
transposedData[i][j] = doc.mData[j][i];
}
}
I thought a few seconds would be sufficient, but it takes several minutes. Also, I am trying with different input dimensions (only 3 lines and much more strings per line, for a same file size of 400MB) and it's much faster.
EDIT 1
On people advice, I performed the profiling using callgrind. I got this message during the process : ... brk segment overflow in thread #1 : can't grow to ...
I analysed the result and summarize it here:
25 % is spent in operator= of basic_string
21 % is spent in construction of basic_string (with only 3% of time in new)
14 % is spent in operator()[] on the outside vector
11 % in spent in operator()[] on the inside vector
Thank you for your suggestions.
Upvotes: 1
Views: 281
Reputation: 35
There is not enough available memory on my machine to perform this task (see below). Splitting my data in three parts, I solved the task in a few seconds. Here is the output of a code scrutinizing memory :
free ram 2.5GB IO populating mData with raw data free ram 0.2GB Empty string capacity : 15 bytes Intending to allocate 1.4 GB terminate called after throwing an instance of 'std::bad_alloc' what() : std::bad_alloc Aborted
Upvotes: 0
Reputation: 119867
The program has redundancies on several levels.
The obvious one is that you don't need to transpose the vector in order to transpose the file.
vector<vector<string> originalData;
// read the file to originalData
for(size_t i = 0 ; i < nbCols ; ++i)
{
for(size_t j = 0 ; j < nbRows ; ++j)
{
cout << originalData[j][i] << " ";
}
cout<<endl;
}
Assuming you do need to produce a transposed vector for some reason, one way to write the transpose loop would be
vector<vector<string>> transposedData (nbCols);
for (size_t j = 0; j < nbCols; ++j)
{
transposedData[j].reserve(nrows);
for (size_t i = 0; i < nbRows; ++i)
{
transposedData[j].emplace_back(originalData[i][j]);
// if keeping original veector is not needed ...
// transposedData[j].emplace_back(std::move(originalData[i][j]));
}
}
On my (fairly beefy) machine it takes about 6-7 seconds to transpose a 1000x90000 matrix of 3-character strings. This is not particularly impressive, if you don't need to transpose multi-million-element matrices 24 hours a day, it does what you need without too much overhead.
Upvotes: 1
Reputation: 7482
First of all before making any claim on the reason why a piece of code is slow you should really measure its performance on your machine and then with data at hand deduce why.
That said I am quite confident in this case in saying that the problem might reside in the fact that you are allocating 90k
vectors of string each of them of size 1k
. As you know memory allocation is costly, and it might explain your performance penalty.
The following is how you could implement your code using only a 1D
array allocated upfront.
size_t nbRows = mData.size();
size_t nbCols = mData[0].size();
auto get_idx = [](const int i, const int nr, const int j)
{
return i*nr+j;
};
std::vector<std::string> transposedData(nbCols*nbRows);
for(size_t i = 0 ; i < nbCols ; ++i)
{
for(size_t j = 0 ; j < nbRows ; ++j)
{
const int idx = get_idx(j, nbCols,i);
transposedData[idx] = std::move(mData[j][i]);
}
}
for(size_t i = 0 ; i < nbCols ; ++i)
{
for(size_t j = 0 ; j < nbRows ; ++j)
{
const int idx = get_idx(j, nbCols,i);
cout<<transposedData[idx]<<" ";
}
cout<<endl;
}
I would like to stress it again: profile your code. Try out software like valgrind --tool= callgrind
or gprof
allowing you to profile and visualize performance data about your app.
Upvotes: 1
Reputation: 126
The penalty may come from the fact you're overusing resize on your for loop.
As per the reference :
Complexity
Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count
Memory allocation costs a lot, so you may want to avoid overdoing it.
As pointed by others, upfront allocation would be an interesting approach to avoid recreating (resizing) your vector every time.
Upvotes: 0