punekr12
punekr12

Reputation: 733

How to grep lines starting with a digit or white space

I need to count messages per hour in my log file. Every log file line is preceded by the time stamp. Hence I am using following 'for' and 'grep' command to do this -

for i in `seq 0 23`
do egrep "$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
done

This will give me number of messages per hour for 0 to 23.

However this does not work with single digit hour such as 5:23:32 because it is preceded by a white space. Then the grep would have to be -

egrep " $i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l

If not it will incorrectly match lines starting with say 15:23:32

So how can I tell grep that a digit can be preceded by a space or start of the line only.

Upvotes: 4

Views: 31505

Answers (4)

repzero
repzero

Reputation: 8412

Using egrep

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" 'filename'; done

^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] this will tell egrep to match from start of line. if the line starts with a whitespace at the start of line or just starts with your pattern grep will match it. Also this will tell grep to match not to match greedily.

for example

using your command with a pattern to find 5:23:32, (where $i=5) we get

5:23:23
   15:23:23

using the command above, we get

 5:23:23

grep comes with a -c option to count

you can also use grep's -c option instead of piping to wc -l

example

for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <pattern>" 'filename'; done

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246807

To match a timestamp where the hour from 0 to 9 is space-padded or zero-padded:

With basic regular expressions

grep '^\([ 01][0-9]\|2[0-3]\):[0-5][0-9]:[0-5][0-9]' file

or extended regular expressions

grep -E '^([ 01][0-9]|2[0-3])(:[0-5][0-9]){2}' file

ref: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html

Upvotes: 0

David Hoelzer
David Hoelzer

Reputation: 16331

grep "^[ 0-9][0]9...

I think this is what you're looking for unless I've misunderstood your question. Add the whitespace to the first set as an option and anchor it to the beginning of the line.

Upvotes: 0

Adam Katz
Adam Katz

Reputation: 16138

I think I can get rid of your for loop. This will work if that time (rather than a date) begins each line:

$ awk -F : '/some_pattern/ { print $1 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then prints the first element (as delimited by a colon), which would be the hour. It is then sorted and repeats of unique elements are counted and displayed on standard output.

However, let's say your logs look like /var/log/syslog, which has lines that look like this:

Feb  9 01:23:45 mycomputer service[PID]: details...

In this case, you have to tell AWK where to look:

$ awk '/some_pattern/ { gsub(/:.*/,"",$3); print $3 }' file |sort |uniq -c

This searches for your desired pattern (kind of like grep), then replaces everything after the first colon of the third element (the time) an prints what remains (the hour). The rest is as described above.

A sample output (of either of the above variants):

 12 07
 34 08
 30 09
 51 10
536 11
346 12
123 13

This notes that there were twelve matches to my query at 7 am and that I didn't really start using this system until 11 am.

Upvotes: 1

Related Questions