Reputation: 31
I have a Perl script which reads two files and processes them.
The first file - info file - I store it as a hash (3.5 gb)
The second file - taregt file - I am processing by using information from the info file and other subroutines as designed. (This file, target, ranges from 30 - 60 gb)
So far working are:
I want to run on all chunks in parallel:
while(chunks){
# do something
sub a {}
sub b {}
}
So basically, I want to read a chunk, write its output and do this for multiple chunks at the same time. The while loop reads each line of a chunk file, and calls on various subroutine for processing.
Is there a way that I can read chunks in background?
I don't want to read info file for every chunk as it is 3.5gb long and I am reading it into hash, which takes up 3.5gb everytime.
Right now the script takes 1 - 2hrs to run for 30-60gb.
Upvotes: 2
Views: 1285
Reputation: 3465
What's about module File::Map
(memory mapping), it can easy read big files.
use strict;
use File::Map qw(map_file);
map_file my $map, $ARGV[0]; # $ARGV[0] - path to your file
# Do something with $map
Upvotes: 1
Reputation: 57600
A 3.5GB hash is very big, you should consider using a database instead. Depending on how you do this, you can keep accessing the database via the hash.
If memory were a non-issue, fork
ing would be the easiest solution. However, this duplicates the process, including the hash, and would only result in unneccessary swapping.
If you cannot free some memory, you should consider to use threads
. Perl threads only live inside the interpreter and are invisible to the OS. These threads have a similar feel to fork
ing, however, you can declare variables as :shared
. (You have to use threads::shared
)
See the official Perl threading tutorial
Upvotes: 1