lamcro
lamcro

Reputation: 6281

How can I quickly find the first line of a file that matches a regex?

I want to search for a line in a file, using regex, inside a Perl script.

Assuming it is in a system with grep installed, is it better to:

Upvotes: 5

Views: 4933

Answers (7)

SergioAraujo
SergioAraujo

Reputation: 11810

sed '/pattern/q' file

Upvotes: 2

Adrian Pronk
Adrian Pronk

Reputation: 13906

One thing to be careful of with grep: In recent Linux distributions, if your LANG environment variable defines a UTF-8 type (e.g. mine is LANG=en_GB.UTF-8) then grep, sed, sort and probably a bunch of other text-processing utilities run about 10 times more slowly. So watch out for that if you are doing performance comparisons. I alias my grep command now to:

LANG= LANGUAGE= /bin/grep

Edit: Actually, it's more like 100 times more slowly

Upvotes: 5

Marc
Marc

Reputation: 1356

I once did a script to search some regular expressions across some big text files (about 10 MB each one). I did it with Perl regexes, and noticed it was going quite slow. So I tried running grep from the script, and the speed boost was quit considerable. So, in my own experience, Perl built-in regexes are slower than grep. But you'll probably only notice it with large files. My advice is: try it both ways, and see how it goes.

Upvotes: 2

ysth
ysth

Reputation: 98398

It depends. If you want to optimize for development time,

$line = `grep '$regex' file | head -n 1`;

is clearly the thing to do.

But it comes at the cost of having to start external processes, depending on things besides perl being installed, and losing the opportunity to do detailed error reporting when something goes wrong.

Upvotes: 3

Dave Sherohman
Dave Sherohman

Reputation: 46187

You don't need to open the file explicitly.

my $regex = qr/blah/;
while (<>) {
  if (/$regex/) {
    print;
    exit;
  }
}
print "Not found\n";

Since you seem concerned about performance, I let the match and print use the default $_ provided by not assigning <> to anything, which is marginally faster. In normal production code,

while (my $line = <>) {
  if ($line =~ /$regex/) {
    print $line;
    exit;
  }
}

would be preferred.

Edit: This assumes that the file to check is given on the command line, which I just noticed you had not stated applies in your case.

Upvotes: 6

Darron
Darron

Reputation: 21628

It depends.

  • working inside Perl saves you the process startup time, and other related resource costs.
  • grep is probably faster than doing the same job in Perl, but not hugely so.

I'd say to do it in Perl unless performance forces you to optimize.

Upvotes: 3

Michael Borgwardt
Michael Borgwardt

Reputation: 346317

In a modern Perl implementation, the regexp code should be just as fast as in grep, but if you're concerned about performance, why don't you simply try it out? From a code cleanliness and robustness standpoint, calling an external command line tool is definitely not good.

Upvotes: 9

Related Questions