Reputation: 11236
I have created a simple perl script to read log files and process the data asynchronously.
The problem i am facing is that the script appears to continuously use more memory the longer it runs. This seems to be affected by the amount of data it processes. The problem I have is that i am unable to identify what exactly is using all this memory, and whether it is a leak or something is just holding onto it.
How can i modify the below script so that it no longer continuously consumes memory ?
#Multithreaded to read multiple log files at the same time.
use strict;
use warnings;
use threads;
use Thread::Queue;
use threads::shared;
my $logq = Thread::Queue->new();
my %Servers :shared;
my %servername :shared;
sub csvsplit {
my $line = shift;
my $sep = (shift or ',');
return () unless $line;
my @cells;
my $re = qr/(?:^|$sep)(?:"([^"]*)"|([^$sep]*))/;
while($line =~ /$re/g) {
my $value = defined $1 ? $1 : $2;
push @cells, (defined $value ? $value : '');
}
return @cells;
}
sub process_data
{
while(sleep(1)){
if ($logq->pending())
{
my %sites;
my %returns;
while($logq->pending() > 0){
my $data = $logq->dequeue();
my @fields = csvsplit($data);
$returns{$fields[$#fields - 1]}++;
$sites{$fields[$#fields]}++;
}
print "counter:$_, value=\"$sites{$_}\" />\n" for (keys%sites);
print "counter:$_, value=\"$returns{$_}\" />\n" for (keys%returns);
}
}
}
sub read_file
{
my $myFile=$_[0];
open(my $logfile,'<',$myFile) || die "error";
my $Inode=(stat($logfile))[1];
my $fileSize=(stat($logfile))[7];
seek $logfile, 0, 2;
for (;;) {
while (<$logfile>) {
chomp( $_ );
$logq->enqueue( $_ );
}
sleep 5;
if($Inode != (stat($myFile))[1] || (stat($myFile))[7] < $fileSize){
close($logfile);
while (! -e $myFile){
sleep 2;
}
open($logfile,'<',$myFile) || die "error";
$Inode=(stat($logfile))[1];
$fileSize=(stat($logfile))[7];
}
seek $logfile, 0, 1;
}
}
my $thr1 = threads->create(\&read_file,"log");
my $thr4 = threads->create(\&process_data);
$thr1->join();
$thr4->join();
The memory only seems to increase when the program has data to process, if i just leave it, it maintains the current memory usage.
Memory only appears to increase for larger throughput and increase about half a Mb per 5 seconds for around 2000 lines in the same time.
I have not included the csv as i do not think it is relevant. If you do and want me to add it please give a valid reason.
GNU bash, version 3.2.57(1)-release (s390x-ibm-linux-gnu)
perl, v5.10.0
I have looked through other questions but cannot find much of relevance. If this is a duplicate or the relevant info is in another question, feel free to mark as a dupe and ill check it out.
Any more info needed just ask.
Upvotes: 1
Views: 154
Reputation: 33658
The reason is probably that the size of your Thread::Queue
is unlimited. If the producer thread is faster than the consumer thread, your queue will continue to grow. So you should simply limit the size of your queue. For example, to set a limit of 1,000 queue items:
$logq->limit = 1000;
(The way you use the pending
method is wrong by the way. You should only terminate if the return value is undefined.)
Upvotes: 2