Reputation: 291
I want to grep some log information from the log files located in the following directory structure using perl:
$jobDir/jobXXXX/host.log
where XXXX
is a job number, from 1 to a few thousands. There's no other kinds of sub directory under $jobDir
and no other files except logs under jobXXXX
. The script is :
my @Info; #store the log informaiton
my $Num = 0;
@Info = qx(grep "information" -r $jobDir); #is this OK ?
foreach(@Info){
if($_=~ /\((\d+)\)(.*)\((\d+)\)/){
Output(xxxxxxxx);
}
$Num=$Num+1; #number count
}
It is found that when then job number is a few thousands, this script will take very long time to output the information.
Is there any way to improve its efficiency?
Thanks!
Upvotes: 1
Views: 210
Reputation: 15121
You should search those log file one by one, and scan each log file line by line, instead of reading the output of grep
to memory (that could cost lots of memory, and slow your program, even your system):
# untested script
my $Num;
foreach my $log (<$jobDir/job*/host.log>) {
open my $logfh, '<', "$log" or die "Cannot open $log: $!";
while (<$logfh>) {
if (m/information/) {
if(m/\((\d+)\)(.*)\((\d+)\)/) {
Output(xxx);
}
$Num++;
}
}
close $logfh;
}
Upvotes: 5
Reputation: 123270
While it would be more elegant to use the matching built into perl (see the other answer), calling the grep
command can be more efficient and faster, especially if there are lots of data but only few matches. But the way you call it is to first run grep and collect all data, and then scan through all the data. This will need more memory because you first collect all data, and you have to wait for the output until all data are collected. Better would be to output as soon as the first data are collected:
open( my $fh,'-|','grep',"information",'-r',$jobDir) or die $!;
while (<$fh>) {
if(/\((\d+)\)(.*)\((\d+)\)/){
Output(xxxxxxxx);
}
$Num=$Num+1; #number count
}
Upvotes: 5