Gert Gottschalk
Gert Gottschalk

Reputation: 1716

Pattern for egrep

I have a giant log file which among other things talks about run times. That's the information I want to extract. The log has lines that look like this:

Info: Executed check 'data_existence', result 'pass', took 0 s.
Info: Executed check 'message', result 'pass', took 20 s.
Info: Executed check 'blu', result 'pass', took 2 minutes.
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.

I want to extract all lines that say 'Info ... took' (there is tons of other stuff in between) but to reduce clutter I want to skip lines that refer only to seconds.

So I wrote:

egrep 'Info: .*took\s*\d*\s*[mhd]' LOGs/my.log

Surprisingly (to me) it did not work (it came back blank). Although the checker at https://regex101.com/ said my pattern was finding something.

What's missing?

Thanks, Gert

@John1024

sc-xterm-26:~> cat test
Info: Executed check 'data_existence', result 'pass', took 0 s.
Info: Executed check 'message', result 'pass', took 20 s.
Info: Executed check 'blu', result 'pass', took 2 minutes.
Info: Executed check 'blu', result 'pass', took 12 minutes.
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.
sc-xterm-26:~>
sc-xterm-26:~>
sc-xterm-26:~> uname -a
Linux sc-xterm-26 3.0.52 #2 SMP Thu Dec 6 02:40:34 PST 2012 x86_64 x86_64 x86_64 GNU/Linux
sc-xterm-26:~> grep --version
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR     PURPOSE.

sc-xterm-26:~> grep -E 'Info: .*took\s*[0-9.]*\s*[mhd]' test
sc-xterm-26:~>
sc-xterm-26:~> grep -E 'Info: .*took\s*[[:digit:].]*\s*[mhd]' test
sc-xterm-26:~>

@All

I put the query into a TCL script and it works fine. No longer need a grep based solution. Best, Gert.

Upvotes: 0

Views: 117

Answers (2)

John1024
John1024

Reputation: 113904

grep does not recognize \d. Try:

$ grep -E 'Info:.*took\s*[0-9.]*\s*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.

Or, better yet:

$ grep -E 'Info:.*took\s*[[:digit:].]*\s*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.

Notes:

  1. egrep is deprecated. Use grep -E instead.

  2. grep is supposed to support POSIX regular expressions. \s is a GNU extension and may not be portable. \d is not supported.

  3. [:digit:] is unicode-safe which makes it a better choice than 0-9.

  4. To match floating point numbers, one must allow a decimal point in addition to digits. Note that, outside of [...], the period . is a wildcard. Inside [...], by contrast, it only matches a period.

More portable version

For greps that do not support \s, try:

$ grep -E 'Info:.*took[[:space:]]*[[:digit:].]*[[:space:]]*[mhd]' logfile
Info: Executed check 'bla', result 'pass', took 2.5 minutes.
Info: Executed check 'foo', result 'pass', took 3.4 hours.
Info: Executed check 'bar', result 'pass', took 2.7 days.

Upvotes: 1

SomeDude
SomeDude

Reputation: 14238

You can try the regex : (Info: .*took\s*[0-9]*.?[0-9]*\s*(minutes|hours|days).)

Demo here

Upvotes: 0

Related Questions