Writing a large number of files from a long running process?

Question

I have a project which scans a large file (2.5GB) picking out strings which will then be written to some subset of several hundred files.

It would be fastest just to use normal buffered writes but

I'm worried about running out of filehandles.
I want to be able to watch the progress of the files while they're being written.
I would prefer as little loss as possible if the process is interrupted. Incomplete files are still partially useful.

So instead I open in read/write mode, append the new line, and close again.

This was fast enough much of the time but I have found that on certain OSes this behaviour is a severe pessimization. Last time I ran it on my Windows 7 netbook I interrupted it after several days!

I can implement some kind of MRU filehandle manager which keeps so many files open and flushes after so many write operations each. But is this overkill?

This must be a common situation, is there a "best practice", a "pattern"?

^{Current implementation is in Perl and has run on Linux, Solaris, and Windows, netbooks to phat servers. But I'm interested in the general problem: language-independent and cross-platform. I've thought of writing the next version in C or node.js.}

Writing a large number of files from a long running process?

Answers (1)

Related Questions