DanielX2010
DanielX2010

Reputation: 1908

Writing huge txt files without overloading RAM

I need to write the results of a process in a txt file. The process is very long and the amount of data to be written is huge (~150Gb). The program works fine, but the problem is that the RAM gets overloaded and, at a certain point, it just stops.

The program is simple:

ostream f;
f.open(filePath);
for(int k=0; k<nDataset; k++){
    //treat element of dataset
    f << result;
}
f.close();

Is there a way of writing this file without overloading the memory?

Upvotes: 2

Views: 1053

Answers (3)

Jiminion
Jiminion

Reputation: 5168

You should flush output periodically.

For example:

if (k%10000 == 0) f.flush(); 

Upvotes: 5

Ben Voigt
Ben Voigt

Reputation: 283961

If that truly is the code where your program gets stuck, then your explanation of the problem is wrong.

  • There's no text file. Your igzstream is not dealing with text, but a gzip archive.

  • There's no data being written. The code you show reads from the stream.

  • I don't know what your program does with result, because you didn't show that. But if it accumulates results into a collection in memory, that will grow. You'll need to find a way to process all your data without loading all of it into RAM at the same time.

  • Your memory usage could be from the decompressor. For some compression algorithms, an entire block has to be stored in memory. In such cases it's best to break the file into blocks and compress each separately (possibly pre-initializing a dictionary with the results of the previous block). I don't think that gzip is such an algorithm, however. You may need to find a library that supports streaming.

Upvotes: 1

ubi
ubi

Reputation: 4399

I'd like to suggest something like this

ogzstream f;
f.open(filePath);
string s("");
for(int k=0; k<nDataset; k++){
    //treat element of dataset

    s.append(result);

    if (s.length() == OPTIMUM_BUFFER_SIZE) {
        f << s;
        f.flush();
        s.clear();
    }

}

f << s;
f.flush();
f.close();

Basically, you construct the stream in memory rather than redirecting to the stream so you don't have to worry about when the stream gets flushed. And when you are redirecting you ensure it's flushed to the actual file. Some ideas for the OPTIMUM_BUFFER_SIZE can be found from here and here.

I'm not exactly sure whether string or vector is the best option for the buffer. Will do some research myself and update the answer or you can refer to Effective STL by Scott Meyers.

Upvotes: 2

Related Questions