Reputation: 7045
Edit 1: I've following logic in perl file, (added output file handler).
for (my $i = 0; $i < 10; $i++) {
my $outputFile=$i."_out";
open(outputHandler, ">$outputFile") or die "Couldn't open output file: $!";
my $filePath = $i;
open(Rfile_handle, $filePath) or die("Could't open input file: $!");
while (<Rfile_handle>) {
my $line = $_;
#Do processing - line by line. Read global variable - no edit/update
#add required fields in my $outputLine variable
print outputHandler "$outputLine\n";
}
close Rfile_handle;
close outputHandler;
}
Still I see same behavior. I see memory usage is constantly increasing. I've to kill the process and rerun the program from the last line executed. This is exactly what I'm doing. There is no change in the code, except the logic part of assigning and extracting data from json. Now, can we infer anything? or what am I doing wrong?
End of Edit 1
I'm a novice programmer in perl. I used code in c#. I've around 10 files around 5GB each. I need to read & process them one-by-one. My system RAM size is only 4GB. So, I used following way to read files in for loop,
for (int i = 0; i < 10; i++) {
my $filePath = i;
open(Rfile_handle, $filePath) or die("Could't open input file: $!");
while (<Rfile_handle>) {
my $line = $_;
//Do processing - line by line
}
close Rfile_handle;
}
When I see task manager it shows, memory usage is increasing. Shouldn't perl free the memory after completing one file and resue it for next file, like c# does it for me? As task manager shows, it is not freeing the memory. Can I dispose/deallocate memory somehow?
I've tried undef
, but it doesn't free up the memory.
What should I do? and what is the best way to read files of such huge size in perl? I want a way to reuse the memory occupied by the variables in for loop.
Note: I can't use any other scripting or programming language.
Upvotes: 1
Views: 624
Reputation: 53488
Don't worry about it. Perl looks like it uses more memory, because it's using internal memory management. It does still free up and re-use internally. It just for obvious reasons has to balloon to the max size of your memory footprint.
In general, the way of ensuring perl is as efficent as possible:
lexically scope variables (especially arrays/hashes) - perl can then figure out when they're no longer used. (It uses reference counting to keep track.).
use while
loops to read files line by line, rather than a foreach loop. (Which will read the whole file into a temp array first)
More generally - you should probably be using a 3 argument open, as two arguments is bad style.
So you should in your code do:
open ( my $input_fh, "<", $filepath ) or die $!;
while( my $line = <$input_fh> ) {
#do stuff;
}
$output_fh
will drop ref count to zero at the end of your 'for' loop and be closed and deallocated.
What are you storing as you're processing your file in the while loop? Perl won't - by default - use memory equal to file size, unless you're 'saving' the whole line somehow?
Oh, and you have a bug:
my $filePath=i;
This'll set your $filePath
to a file called i
which isn't going to work. Turn on:
use strict;
use warnings;
and you'll be told about this sort of problem. (Same problem in your for
loop. i
is not a valid variable name in perl, you should be using $i
.
See also:
http://learn.perl.org/faq/perlfaq3.html#How-can-I-free-an-array-or-hash-so-my-program-shrinks-
http://perldoc.perl.org/perlfaq3.html#How-can-I-make-my-Perl-program-take-less-memory%3f
As a result of how perl uses reference counting, there is a 'gotcha' in perl - you can create circular chains of references. Because they're still referenced, perl won't garbage collect. If you're in danger of having this problem, you can use weaken()
from Scalar::Util
.
Upvotes: 6
Reputation: 46187
If you're doing your processing line-by-line as shown by the code in the question (as opposed to storing the entire file contents into an array, hash, or other data structure) and the consumed memory is substantially larger than the amount of data per line, then you most likely have a memory leak in your processing code - you're using a variable whose reference count never falls to 0 (probably due to circular references in a data structure), so Perl is never able to reuse that memory.
Upvotes: 2