Reputation: 6281
I want to search for a line in a file, using regex, inside a Perl script.
Assuming it is in a system with grep installed, is it better to:
grep
through an open()
commandopen()
the file directly and use a while
loop and an if ($line =~ m/regex/)
?Upvotes: 5
Views: 4933
Reputation: 13906
One thing to be careful of with grep: In recent Linux distributions, if your LANG environment variable defines a UTF-8 type (e.g. mine is LANG=en_GB.UTF-8) then grep, sed, sort and probably a bunch of other text-processing utilities run about 10 times more slowly. So watch out for that if you are doing performance comparisons. I alias my grep command now to:
LANG= LANGUAGE= /bin/grep
Edit: Actually, it's more like 100 times more slowly
Upvotes: 5
Reputation: 1356
I once did a script to search some regular expressions across some big text files (about 10 MB each one). I did it with Perl regexes, and noticed it was going quite slow. So I tried running grep from the script, and the speed boost was quit considerable. So, in my own experience, Perl built-in regexes are slower than grep. But you'll probably only notice it with large files. My advice is: try it both ways, and see how it goes.
Upvotes: 2
Reputation: 98398
It depends. If you want to optimize for development time,
$line = `grep '$regex' file | head -n 1`;
is clearly the thing to do.
But it comes at the cost of having to start external processes, depending on things besides perl being installed, and losing the opportunity to do detailed error reporting when something goes wrong.
Upvotes: 3
Reputation: 46187
You don't need to open
the file explicitly.
my $regex = qr/blah/;
while (<>) {
if (/$regex/) {
print;
exit;
}
}
print "Not found\n";
Since you seem concerned about performance, I let the match and print
use the default $_
provided by not assigning <>
to anything, which is marginally faster. In normal production code,
while (my $line = <>) {
if ($line =~ /$regex/) {
print $line;
exit;
}
}
would be preferred.
Edit: This assumes that the file to check is given on the command line, which I just noticed you had not stated applies in your case.
Upvotes: 6
Reputation: 21628
It depends.
I'd say to do it in Perl unless performance forces you to optimize.
Upvotes: 3
Reputation: 346317
In a modern Perl implementation, the regexp code should be just as fast as in grep, but if you're concerned about performance, why don't you simply try it out? From a code cleanliness and robustness standpoint, calling an external command line tool is definitely not good.
Upvotes: 9