Reputation: 2410
I'm currently writing a script that takes a database as input and generates all valid combinations from the 10+ tables, following certain rules. Since the output is pretty darn huge, i'm dumping this through gzip into the file, like this:
open( my $OUT, '|-', "gzip > file" );
for ( @data ) {
my $line = calculate($_);
print $OUT $line;
}
Due to the nature of the beast though i end up having to make hundreds of thousands of small writes, one for each line. This means that between each calculation it waits for gzip to receive the data and get done compressing it. At least i think so, i might be wrong.
In case I'm right though, I'm wondering how i can make this print asynchronous, i.e. it fires the data at gzip and then goes on processing the data.
Upvotes: 2
Views: 276
Reputation: 62109
Pipes already use a buffer so that the writing program doesn't have to wait for the reading program. However, that buffer is usually fairly small (it's normally only 64KB on Linux) and not easily changed (it requires recompiling the kernel). If the standard buffer is not enough, the easiest thing to do is include a buffering program in the pipeline:
open( my $OUT, '|-', "bfr | gzip > file" );
bfr simply reads STDIN into an in-memory buffer, and writes to STDOUT as fast as the next program allows. The default is a 5MB buffer, but you can change that with the -b
option (e.g. bfr -b10m
for a 10MB buffer).
Upvotes: 2
Reputation: 138357
Give IO::Compress::Gzip
a try. It accepts a filehandle to write to. You can set O_NONBLOCK
on that filehandle.
Upvotes: 4
Reputation: 3727
naturally i'll will do it in a thread or with a fork as you wish. http://hell.jedicoder.net/?p=82
Upvotes: 1