Reputation: 141

analyzing time tracking data in linux

I have a log file containing a time series of events. Now, I want to analyze the data to count the number of event for different intervals. Each entry shows that an event has occured in this timestamp. For example here is a part of log file

09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55

I need to count the events for 5 minutes intervals. The result should be like:

09:00  4       //which means 4 events from time 09:00:00 until 09:04:59<br>
09:05  5        //which means 4 events from time 09:00:05 until 09:09:59<br>

and so on.

Do you know any trick in bash, shell, awk, ...?
Any help is appreciated.

Upvotes: 0

Answers (4)

Jonathan Quist

Reputation: 87

I realize this is an old question, but when I stumbled onto it I couldn't resist poking at it from another direction...

sed -e 's/:/ /' -e 's/[0-4]:.*$/0/' -e 's/[5-9]:.*$/5/' | uniq -c

In this form it assumes the data is from standard input, or add the filename as the final argument before the pipe.

It's not unlike Michal's initial approach, but if you happen to need a quick and dirty analysis of a huge log, sed is a lightweight and capable tool.

The assumption is that the data truly is in a regular format - any hiccups will appear in the result.

As a breakdown - given the input

09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08

and applying each edit clause individually, the intermediate results are as follows: 1) Eliminate the first colon.

-e 's/:/ /'
09 00:35
09 01:20
09 02:51
09 03:04
09 05:12

2) Transform minutes 0 through 4 to 0.

-e 's/[0-4]:.*$/0/'
09 00
09 00
09 00
09 00
09 05:12
09 06:08

3) Transform minutes 5-9 to 5:

-e 's/[5-9]:.*$/5/'
09 00
09 00
09 00
09 00
09 05
09 05

2 and 3 also delete all trailing content from the lines, which would make the lines non-unique (and hence 'uniq -c' would fail to produce the desired results).

Perhaps the biggest strength of using sed as the front end is that you can select on lines of interest, for example, if root logged in remotely:

sed -e '/sshd.*: Accepted .* for root from/!d' -e 's/:/ /' ... /var/log/secure

Upvotes: 0

Michal Gasek

Reputation: 6413

Perl with output piped through uniq just for fun:

$ cat file
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55
11:21:00

Command:

perl -F: -lane 'print $F[0].sprintf(":%02d",int($F[1]/5)*5);' file | uniq -c

Output:

Or just perl:

perl -F: -lane '$t=$F[0].sprintf(":%02d",int($F[1]/5)*5); $c{$t}++; END { print join(" ", $_, $c{$_}) for sort keys %c }' file

Output:

Upvotes: 0

BMW

Reputation: 45243

another way with awk

awk -F : '{t=sprintf ("%02d",int($2/5)*5);a[$1 FS t]++}END{for (i in a) print i,a[i]}' file |sort -t: -k1n -k2n

09:00 5
09:05 5

explanation:

use : as field seperator
int($2/5)*5 is used to group the minutes into every 5 minute (00,05,10,15...)
a[$1 FS t]++ count the numbers.
the last sort command will output the sorted time.

Upvotes: 0

fedorqui

Reputation: 289755

awk to the rescue.

awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' file

Explanation

It gets the values of the 1st, 2nd, 4th and 5th characters in every line and keeps track of how many times they have appeared. To group in 0-4 and 5-9 range, it creates the var min that is 0 in the first case and 5 in the second.

Sample

With your input,

$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5

With another sample input,

$ cat a
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55

$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5
0915 2
1005 1
1015 1

Upvotes: 1

analyzing time tracking data in linux

Answers (4)

Explanation

Sample

Related Questions