Reputation: 1716
I have a giant log file which among other things talks about run times. That's the information I want to extract. The log has lines that look like this:
Info: Executed check 'data_existence', result 'pass', took 0 s.
Info: Executed check 'message', result 'pass', took 20 s.
Info: Executed check 'blu', result 'pass', took 2 minutes.
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
I want to extract all lines that say 'Info ... took' (there is tons of other stuff in between) but to reduce clutter I want to skip lines that refer only to seconds.
So I wrote:
egrep 'Info: .*took\s*\d*\s*[mhd]' LOGs/my.log
Surprisingly (to me) it did not work (it came back blank). Although the checker at https://regex101.com/ said my pattern was finding something.
What's missing?
Thanks, Gert
@John1024
sc-xterm-26:~> cat test
Info: Executed check 'data_existence', result 'pass', took 0 s.
Info: Executed check 'message', result 'pass', took 20 s.
Info: Executed check 'blu', result 'pass', took 2 minutes.
Info: Executed check 'blu', result 'pass', took 12 minutes.
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
sc-xterm-26:~>
sc-xterm-26:~>
sc-xterm-26:~> uname -a
Linux sc-xterm-26 3.0.52 #2 SMP Thu Dec 6 02:40:34 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
sc-xterm-26:~> grep --version
grep (GNU grep) 2.5.1
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
sc-xterm-26:~> grep -E 'Info: .*took\s*[0-9.]*\s*[mhd]' test
sc-xterm-26:~>
sc-xterm-26:~> grep -E 'Info: .*took\s*[[:digit:].]*\s*[mhd]' test
sc-xterm-26:~>
@All
I put the query into a TCL script and it works fine. No longer need a grep based solution. Best, Gert.
Upvotes: 0
Views: 117
Reputation: 113904
grep
does not recognize \d
. Try:
$ grep -E 'Info:.*took\s*[0-9.]*\s*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
Or, better yet:
$ grep -E 'Info:.*took\s*[[:digit:].]*\s*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
Notes:
egrep
is deprecated. Use grep -E
instead.
grep
is supposed to support POSIX regular expressions. \s
is a GNU extension and may not be portable. \d
is not supported.
[:digit:]
is unicode-safe which makes it a better choice than 0-9
.
To match floating point numbers, one must allow a decimal point in addition to digits. Note that, outside of [...]
, the period .
is a wildcard. Inside [...]
, by contrast, it only matches a period.
For greps that do not support \s
, try:
$ grep -E 'Info:.*took[[:space:]]*[[:digit:].]*[[:space:]]*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
Upvotes: 1