RobbySherwood
RobbySherwood

Reputation: 361

performance grep vs perl -ne

I made a performance test with really surprising result: perl is more than 20 times faster!

Is this normal? Does it result from my regular expression? is egrep far slower than grep? ... i tested on a current cygwin and a current OpenSuSE 13.1 in virtualbox.

Fastest Test with perl:

time zcat log.gz \
| perl -ne 'print if ($_ =~ /^\S+\s+\S+\s+(ERROR|WARNING|SEVERE)\s/ )' 
| tail
2014-06-24 14:51:43,929 SEVERE ajp-0.0.0.0-8009-13             SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:01,031 ERROR  HFN SI ThreadPool(4)-442        CepEventUnmarshaler                     Unmarshaled Events Duration: 111
2014-06-24 14:52:03,556 ERROR  HFN SI ThreadPool(4)-444        CepEventUnmarshaler                     Unmarshaled Events Duration: 52
2014-06-24 14:52:06,789 SEVERE ajp-0.0.0.0-8009-1              SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:06,792 SEVERE ajp-0.0.0.0-8009-1              SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:07,371 SEVERE ajp-0.0.0.0-8009-9              SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:07,373 SEVERE ajp-0.0.0.0-8009-9              SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:07,780 SEVERE ajp-0.0.0.0-8009-11             SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 14:52:07,782 SEVERE ajp-0.0.0.0-8009-11             SessionDataUpdateManager                cannot register active data when window has no name
2014-06-24 15:06:24,119 ERROR  HFN SI ThreadPool(4)-443        CepEventUnmarshaler                     Unmarshaled Events Duration: 117

real    0m0.151s
user    0m0.062s
sys     0m0.139s

fine!

far slower test with egrep:

time zcat log.gz \
| egrep '^\S+\s+\S+\s+(ERROR|WARNING|SEVERE)\s' 
| tail
...

real    0m2.454s
user    0m2.448s
sys     0m0.092s

(Output was same as above...)

finally even slower grep with different notation (my first try)

time zcat log.gz \
| egrep '^[^\s]+\s+[^\s]+\s+(ERROR|WARNING|SEVERE)\s'
| tail
...

real    0m4.295s
user    0m4.272s
sys     0m0.138s

(Output was same as above...)

The ungzipped file size is about 2.000.000 lines an un-gzip-ped 500MBytes - matching line count is very small.

my tested versions:

perhaps some Bug with newer grep versions?

Upvotes: 2

Views: 2532

Answers (3)

Andy Lester
Andy Lester

Reputation: 93676

You should be able to make the Perl a little bit faster by making the parentheses non-capturing:

(?:ERROR|WARNING|SEVERE)

Also, it's unnecessary to match against $_. $_ is assumed if there is nothing specified. That's why it exists.

perl -ne 'print if /^\S+\s+\S+\s+(?:ERROR|WARNING|SEVERE)\s/'

Upvotes: 1

RobbySherwood
RobbySherwood

Reputation: 361

As Chris Hamel commented.... not using the "|" atom grep becomes faster about 10+ times - still slower than perl.

time zcat log.gz \
| egrep '^[^\s]+\s+[^\s]+\s+(ERROR)\s'
| tail
...
real    0m0.216s
user    0m0.062s
sys     0m0.123s

So with 2 "|" atoms the grep run gets more than 3 times slower than running three greps after each other - sounds like a Bug for me... any earlier grep version to test around? i have a reHat5 also... grep seems similar slow there...

Upvotes: 0

horshack
horshack

Reputation: 153

You get tricked by your operating system's cache. When reading and grepping files some layers to the filesystem get walked through:

  • Harddrive own cache
  • OS read cache

To really know what's going on it's a good idea to warm up these caches by running some tests which do their work but do not count. After these test stop your runnning time.

Upvotes: 0

Related Questions