Arnaugir
Arnaugir

Reputation: 108

Writing data to a file in C++ - most efficient way?

In my current project I'm dealing with a big amount of data which is being generated on-the-run by means of a "while" loop. I want to write the data onto a CSV file, and I don't know what's better - should I store all the values in a vector array and write to the file at the end, or write in every iteration?

I guess the first choice it's better, but I'd like an elaborated answer if that's possible. Thank you.

Upvotes: 6

Views: 5080

Answers (3)

Thomas Matthews
Thomas Matthews

Reputation: 57749

The most efficient method to write to a file is to reduce the number of write operations and increase the data written per operation.

Given a byte buffer of 512 bytes, the most inefficient method is to write 512 bytes, one write operation at a time. A more efficient method is to make one operation to write 512 bytes.

There is overhead associated with each call to write to a file. That overhead consists of locating the file on the drive in it's catalog, seeking to the a new location on the drive and writing. The actual operation of writing is quite fast; it's this seeking and waiting for the hard drive to spin up and get ready that wastes your time. So spin it up once, keep it spinning by writing a lot of stuff, then let it spin down. The more data written while the platters are spinning the more efficient the write will be.

  • Yes, there are caches everywhere along the data path, but all that will be more efficient with large data sizes.

I would recommend writing the the formatted to a text buffer (that is a multiple of 512), and at certain points, flush the buffer to the hard drive. (512 bytes is a common sector size multiple on hard drives).

If you like threads, you can create a thread that monitors the output buffer. When the output buffer reaches a threshold, the thread writes the contents to drive. Multiple buffers can help by having the fast processor fill up buffers while other buffers are written to the slow drive.

If your platform has DMA you might be able to speed things up by having the DMA write the data for you. Although I would expect a good driver to do this automatically.

I do use this technique on an embedded system, using a UART (RS232 port) instead of a hard drive. By using the buffering, I'm able go get about 80% efficiency.
(Loop unrolling may also help.)

Upvotes: 0

AlbertoAlegria
AlbertoAlegria

Reputation: 9

The easiest way is in console with > operator. In linux:

./miProgram > myData.txt

Thats get the input of the program and puts in a file.

Sorry for the english :)

Upvotes: -1

Ben Voigt
Ben Voigt

Reputation: 283883

Make sure that you're using an I/O library with buffering enabled, and then write every iteration.

This way your computer can start doing disk access in parallel with the remaining computations.

PS. Don't do anything crazy like flushing after each write, or opening and closing the file each iteration. That would kill efficiency.

Upvotes: 3

Related Questions