The fastest way to write data while producing it

In my program I am simulating a N-body system for a large number of iterations. For each iteration I produce a set of 6N coordinates which I need to append to a file and then use for executing the next iteration. The code is written in C++ and currently makes use of ofstream's method write() to write the data in binary format at each iteration.

I am not an expert in this field, but I would like to improve this part of the program, since I am in the process of optimizing the whole code. I feel that the latency associated with writing the result of the computation at each cycle significantly slows down the performance of the software.

I'm confused because I have no experience in actual parallel programming and low level file I/O. I thought of some abstract techniques that I imagined I could implement, since I am programming for modern (possibly multi-core) machines with Unix OSes:

Writing the data in the file in chunks of n iterations (there seem to be better ways to proceed...)
Parallelizing the code with OpenMP (how to actually implement a buffer so that the threads are synchronized appropriately, and do not overlap?)
Using mmap (the file size could be huge, on the order of GBs, is this approach robust enough?)

However, I don't know how to best implement them and combine them appropriately.

Upvotes: 7

Answers (4)

Offirmo

Reputation: 19840

Of course writing into a file at each iteration is inefficient and most likely slow down your computing. (as a rule of thumb, depends on your actuel case)

You have to use a producer -> consumer design pattern. They will be linked by a queue, like a conveyor belt.

The producer will try to produce as fast as it can, only slowing if the consumer can't handle it.
The consumer will try to "consume" as fast as it can.

By splitting the two, you can increase performance more easily because each process is simpler and has less interferences from the other.

If the producer is faster, you need to improve the consumer, in your case by writing into file in the most efficient way, chunk by chunk most likely (as you said)
If the consumer is faster, you need to improve the producer, most likely by parallelizing it as you said.

There is no need to optimize both. Only optimize the slowest (the bottleneck).

Practically, you use threads and a synchronized queue between them. For implementation hints, have a look here, especially §18.12 "The Producer-Consumer Pattern".

About flow management, you'll have to add a little bit more complexity by selecting a "max queue size" and making the producer(s) wait if the queue has not enough space. Beware of deadlocks then, code it carefully. (see the wikipedia link I gave about that)

Note : It's a good idea to use boost threads because threads are not very portable. (well, they are since C++0x but C++0x availability is not yet good)

Upvotes: 3

alanxz

Reputation: 2026

If you don't want to play with doing stuff in a different threads, you could try using aio_write(), which allows asynchronous writes. Essentially you give the OS the buffer to write, and the function returns immediately, and finishes the the write while your program continues, you can check later to see if the write has completed.

This solution still does suffer from the producer/consumer problem mentioned in other answers, if your algorithm is producing data faster than it can be written, eventually you will run out of memory to store the results between the algorithm and the write, so you'd have to try it and see how it works out.

Upvotes: 1

vines

Reputation: 5225

It's better to split operation into two independent processes: data-producing and file-writing. Data-producing would use some buffer for iteration-wise data passing, and file-writing would use a queue to store write requests. Then, data-producing would just post a write request and go on, while file-writing would cope with the writing in the background.

Essentially, if the data is produced much faster than it can possibly be stored, you'll quickly end up holding most of it in the buffer. In that case your actual approach seems to be quite reasonable as is, since little can be done programmatically then to improve the situation.

Upvotes: 1

Martin Beckett

Reputation: 96109

"Using mmap (the file size could be huge, on the order of GBs, is this approach robust enough?)"

mmap is the OS's method of loading programs, shared libraries and the page/swap file - it's as robust as any other file I/O and generally higher performance.

BUT on most OS's it's bad/difficult/impossible to expand the size of a mapped file while it's being used. So if you know the size of the data, or you are only reading, it's great. For a log/dump that you are continually adding to it's less sutiable - unless you know some maximum size.

Upvotes: 0

The fastest way to write data while producing it

Answers (4)

Related Questions