Reputation: 141
I have a log file containing a time series of events. Now, I want to analyze the data to count the number of event for different intervals. Each entry shows that an event has occured in this timestamp. For example here is a part of log file
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
I need to count the events for 5 minutes intervals. The result should be like:
09:00 4 //which means 4 events from time 09:00:00 until 09:04:59<br>
09:05 5 //which means 4 events from time 09:00:05 until 09:09:59<br>
and so on.
Do you know any trick in bash, shell, awk, ...?
Any help is appreciated.
Upvotes: 0
Views: 141
Reputation: 87
I realize this is an old question, but when I stumbled onto it I couldn't resist poking at it from another direction...
sed -e 's/:/ /' -e 's/[0-4]:.*$/0/' -e 's/[5-9]:.*$/5/' | uniq -c
In this form it assumes the data is from standard input, or add the filename as the final argument before the pipe.
It's not unlike Michal's initial approach, but if you happen to need a quick and dirty analysis of a huge log, sed is a lightweight and capable tool.
The assumption is that the data truly is in a regular format - any hiccups will appear in the result.
As a breakdown - given the input
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
and applying each edit clause individually, the intermediate results are as follows: 1) Eliminate the first colon.
-e 's/:/ /'
09 00:35
09 01:20
09 02:51
09 03:04
09 05:12
2) Transform minutes 0 through 4 to 0.
-e 's/[0-4]:.*$/0/'
09 00
09 00
09 00
09 00
09 05:12
09 06:08
3) Transform minutes 5-9 to 5:
-e 's/[5-9]:.*$/5/'
09 00
09 00
09 00
09 00
09 05
09 05
2 and 3 also delete all trailing content from the lines, which would make the lines non-unique (and hence 'uniq -c' would fail to produce the desired results).
Perhaps the biggest strength of using sed as the front end is that you can select on lines of interest, for example, if root logged in remotely:
sed -e '/sshd.*: Accepted .* for root from/!d' -e 's/:/ /' ... /var/log/secure
Upvotes: 0
Reputation: 6413
Perl with output piped through uniq
just for fun:
$ cat file
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55
11:21:00
Command:
perl -F: -lane 'print $F[0].sprintf(":%02d",int($F[1]/5)*5);' file | uniq -c
Output:
5 09:00
5 09:05
2 09:15
1 10:05
1 10:15
1 11:20
1 11:00
Or just perl:
perl -F: -lane '$t=$F[0].sprintf(":%02d",int($F[1]/5)*5); $c{$t}++; END { print join(" ", $_, $c{$_}) for sort keys %c }' file
Output:
09:00 5
09:05 5
09:15 2
10:05 1
10:15 1
11:00 1
11:20 1
Upvotes: 0
Reputation: 45243
another way with awk
awk -F : '{t=sprintf ("%02d",int($2/5)*5);a[$1 FS t]++}END{for (i in a) print i,a[i]}' file |sort -t: -k1n -k2n
09:00 5
09:05 5
explanation:
use : as field seperator
int($2/5)*5 is used to group the minutes into every 5 minute (00,05,10,15...)
a[$1 FS t]++ count the numbers.
the last sort command will output the sorted time.
Upvotes: 0
Reputation: 289755
awk
to the rescue.
awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' file
It gets the values of the 1st, 2nd, 4th and 5th characters in every line and keeps track of how many times they have appeared. To group in 0-4
and 5-9
range, it creates the var min
that is 0
in the first case and 5
in the second.
With your input,
$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5
With another sample input,
$ cat a
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55
$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5
0915 2
1005 1
1015 1
Upvotes: 1