Reputation: 361
I made a performance test with really surprising result: perl is more than 20 times faster!
Is this normal? Does it result from my regular expression? is egrep far slower than grep? ... i tested on a current cygwin and a current OpenSuSE 13.1 in virtualbox.
Fastest Test with perl:
time zcat log.gz \
| perl -ne 'print if ($_ =~ /^\S+\s+\S+\s+(ERROR|WARNING|SEVERE)\s/ )'
| tail
2014-06-24 14:51:43,929 SEVERE ajp-0.0.0.0-8009-13 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:01,031 ERROR HFN SI ThreadPool(4)-442 CepEventUnmarshaler Unmarshaled Events Duration: 111
2014-06-24 14:52:03,556 ERROR HFN SI ThreadPool(4)-444 CepEventUnmarshaler Unmarshaled Events Duration: 52
2014-06-24 14:52:06,789 SEVERE ajp-0.0.0.0-8009-1 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:06,792 SEVERE ajp-0.0.0.0-8009-1 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:07,371 SEVERE ajp-0.0.0.0-8009-9 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:07,373 SEVERE ajp-0.0.0.0-8009-9 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:07,780 SEVERE ajp-0.0.0.0-8009-11 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 14:52:07,782 SEVERE ajp-0.0.0.0-8009-11 SessionDataUpdateManager cannot register active data when window has no name
2014-06-24 15:06:24,119 ERROR HFN SI ThreadPool(4)-443 CepEventUnmarshaler Unmarshaled Events Duration: 117
real 0m0.151s
user 0m0.062s
sys 0m0.139s
fine!
far slower test with egrep:
time zcat log.gz \
| egrep '^\S+\s+\S+\s+(ERROR|WARNING|SEVERE)\s'
| tail
...
real 0m2.454s
user 0m2.448s
sys 0m0.092s
(Output was same as above...)
finally even slower grep with different notation (my first try)
time zcat log.gz \
| egrep '^[^\s]+\s+[^\s]+\s+(ERROR|WARNING|SEVERE)\s'
| tail
...
real 0m4.295s
user 0m4.272s
sys 0m0.138s
(Output was same as above...)
The ungzipped file size is about 2.000.000 lines an un-gzip-ped 500MBytes - matching line count is very small.
my tested versions:
perhaps some Bug with newer grep versions?
Upvotes: 2
Views: 2532
Reputation: 93676
You should be able to make the Perl a little bit faster by making the parentheses non-capturing:
(?:ERROR|WARNING|SEVERE)
Also, it's unnecessary to match against $_
. $_
is assumed if there is nothing specified. That's why it exists.
perl -ne 'print if /^\S+\s+\S+\s+(?:ERROR|WARNING|SEVERE)\s/'
Upvotes: 1
Reputation: 361
As Chris Hamel commented.... not using the "|" atom grep becomes faster about 10+ times - still slower than perl.
time zcat log.gz \
| egrep '^[^\s]+\s+[^\s]+\s+(ERROR)\s'
| tail
...
real 0m0.216s
user 0m0.062s
sys 0m0.123s
So with 2 "|" atoms the grep run gets more than 3 times slower than running three greps after each other - sounds like a Bug for me... any earlier grep version to test around? i have a reHat5 also... grep seems similar slow there...
Upvotes: 0
Reputation: 153
You get tricked by your operating system's cache. When reading and grepping files some layers to the filesystem get walked through:
To really know what's going on it's a good idea to warm up these caches by running some tests which do their work but do not count. After these test stop your runnning time.
Upvotes: 0