Reputation: 733
I need to count messages per hour in my log file. Every log file line is preceded by the time stamp. Hence I am using following 'for' and 'grep' command to do this -
for i in `seq 0 23`
do egrep "$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
done
This will give me number of messages per hour for 0 to 23.
However this does not work with single digit hour such as 5:23:32
because it is preceded by a white space. Then the grep would have to be -
egrep " $i:[0-9][0-9]:[0-9][0-9] <some_pattern>" filename | wc -l
If not it will incorrectly match lines starting with say 15:23:32
So how can I tell grep that a digit can be preceded by a space or start of the line only.
Upvotes: 4
Views: 31505
Reputation: 8412
Using egrep
for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <some_pattern>" 'filename'; done
^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9]
this will tell egrep to match from start of line. if the line starts with a whitespace at the start of line or just starts with your pattern grep will match it. Also this will tell grep to match not to match greedily.
for example
using your command with a pattern to find 5:23:32
, (where $i=5) we get
5:23:23
15:23:23
using the command above, we get
5:23:23
grep comes with a -c option to count
you can also use grep's -c option instead of piping to wc -l
example
for i in `seq 0 23`; do egrep -c "^[[:space:]]*$i:[0-9][0-9]:[0-9][0-9] <pattern>" 'filename'; done
Upvotes: 1
Reputation: 246807
To match a timestamp where the hour from 0 to 9 is space-padded or zero-padded:
With basic regular expressions
grep '^\([ 01][0-9]\|2[0-3]\):[0-5][0-9]:[0-5][0-9]' file
or extended regular expressions
grep -E '^([ 01][0-9]|2[0-3])(:[0-5][0-9]){2}' file
ref: https://www.gnu.org/software/gnulib/manual/html_node/Regular-expression-syntaxes.html
Upvotes: 0
Reputation: 16331
grep "^[ 0-9][0]9...
I think this is what you're looking for unless I've misunderstood your question. Add the whitespace to the first set as an option and anchor it to the beginning of the line.
Upvotes: 0
Reputation: 16138
I think I can get rid of your for
loop. This will work if that time (rather than a date) begins each line:
$ awk -F : '/some_pattern/ { print $1 }' file |sort |uniq -c
This searches for your desired pattern (kind of like grep
), then prints the first element (as delimited by a colon), which would be the hour. It is then sorted and repeats of unique elements are counted and displayed on standard output.
However, let's say your logs look like /var/log/syslog
, which has lines that look like this:
Feb 9 01:23:45 mycomputer service[PID]: details...
In this case, you have to tell AWK where to look:
$ awk '/some_pattern/ { gsub(/:.*/,"",$3); print $3 }' file |sort |uniq -c
This searches for your desired pattern (kind of like grep
), then replaces everything after the first colon of the third element (the time) an prints what remains (the hour). The rest is as described above.
A sample output (of either of the above variants):
12 07
34 08
30 09
51 10
536 11
346 12
123 13
This notes that there were twelve matches to my query at 7 am and that I didn't really start using this system until 11 am.
Upvotes: 1