Reputation: 11
I have to read in very large text files in Qt, up to 3 GB, and store them as a collection of lines. (To work with them later) I know the lines have a very similar size, so i calculate a possible amount of lines and resize the vector before reading the file. But i still get a bad_alloc at about 3.000.000 lines or ~916 MB reserverd RAM. At the time the program crashes, no single push_back where called, because at a 136 MB-File my code resize the vector to > 7.000.000.
I am running Windows 10 x64 with 8 GB RAM, 4,9 are free.
This is my attempt:
QString filepath = "K://_test//test.txt";
QFile qfile(filepath)
if (!qfile.open(QIODevice::ReadOnly | QIODevice::Text)) {
return false;
}
// All lines have similar size, so try to calculate the amount from filesize
QFileInfo info(qfile);
long size = info.size() / 1024; // in kb
size = size / 0.0453333; // Cutting decimals is ok at this amount
std::vector<QString> result;
if (size > 0) {
// Replaced: result.resize(size);
result.reserve(size);
}
//Reading
QTextStream in(&qfile);
QString line = "";
long cnt = 0;
while (!in.atEnd()) {
line = in.readLine();
if (line.isEmpty() == false)
{
result.push_back(line);
/**Replaced:
if (cnt > (size - 1)) {
result.push_back(line);
}
else {
result.at(cnt) = line;
}*/
cnt++;
}
}
// Removed: result.shrink_to_fit();
file->setLines(result);
// file is a object with only the filepath and the lines in it.
Edit: I just figured something out. I (have to) use QML, and my QML creates the class instance where the file is read. If I read the file from the main-method without loading a .qml-file, no bad_alloc occours. If i load the qml and read the file, qt says there is not enough memory to load the qml-libraries.
Edit 2: So, without QML the crash occours at 8.000.000 lines and 1,5 GB reserved space.
Edit 3: I updated the code above to the current state.
Upvotes: 1
Views: 421
Reputation: 106196
result.resize(size);
I think you want to reserve(size)
there, as resize()
does the equivalent of push_back
-ing size
empty strings....
Further, keep in mind that the vector
simply holds the fixed-sized QString
string management objects: they presumably contain pointers, and when actual text is assigned into them, they'll dynamically allocate memory in which to store that text. That's very likely where your bad_alloc
is coming from. Such allocations must be expected inside in.readLine();
.
You should probably get rid of this...
result.shrink_to_fit();
...as an implementation might try to copy the strings from the existing buffer to one exactly and only large enough, and in so doing temporarily need even more memory.
If you want to retain huge amounts of text in memory with extremely low overheads, I suggest you memory map the file. You can keep a vector
of pointers to the first character in each line if that's useful for you.
Upvotes: 2