Reputation: 649
If I have multiple threads generating blocks of a file, what is the best way to write out the blocks?
ex) 5 threads working on a file of 500 blocks, block 0 is not necessarily completed before block 1, but the output file on disk need to be in order. (block 0, block 1, block 2, .... block 499)
the program is in C++, can fwrite() somehow "random access" the file? the file is created from scratch, meaning when block 5 is completed, the file may still be of size 0 due to block 1~4 are not completed yet. Can I directly write out block 5? (with proper fseek)
This piece of code is performance critical, so I'm really curious about anything that can improve the perf. This looks like a multiple producer(block generators) and one consumer(output writer) scenario. The idea case is that thread A can continue generating the next block when it complete the previous.
if fwrite can be "random", then the output writer can simply takes outputs, seek, and then write. However not sure if this design can perform well in large scale.
Some limitations
Upvotes: 4
Views: 822
Reputation: 14688
Assuming each block is the same size, and that the blocks are generated in memory before they are required to be written to disk, then a combination of lseek
and write
would be perfectly fine.
If you are able to write the entire block in one write you would not gain any advantage in using fwrite -- so just use write directly -- however you would need some sort of locking access control (mutex) if all the threads are sharing the same fd -- since seek+write cannot be done atomically, and you would not want one thread to seek before an just before a second thread is about to write.
This further assume that your file system is a standard file system, and not of some exotic nature, since not all input/output device everything supports lseek
(for example a pipe).
Update: lseek can seek beyond the end of file, just set the whence parameter = SEEK_SET and the offset to the absolute position in the file (fseek has the same option, but I have never used).
Upvotes: 1