Dragos Cazangiu
Dragos Cazangiu

Reputation: 157

Making a statistic in bash

I have a file containing about 1000 lines that are pretty much like this:

0,23423423,7ds5dsfdf,2008-08-03,19:00:01,101,hJ890
1,54645645,f9g8f9gd7,2008-08-03,19:00:20,113,Lg78s
1,54645645,f9g8f9gd7,2008-08-03,19:00:09,108,Lg78s
0,54645645,f9g8f9gd7,2008-08-03,19:00:01,130,dsf98
1,54645645,f9g8f9gd7,2008-08-03,19:00:20,105,Lg78s

The column after the time represents a number of seconds. How can I make a statistic based on the number of seconds for each date in the file, starting from the smallest one to the largest? For example, I should get something like:

The date Sun Aug  3 19:00:01 EEST 2008 has 231 seconds
The date Sun Aug  3 19:00:09 EEST 2008 has 108 seconds
The date Sun Aug  3 19:00:20 EEST 2008 has 218 seconds

I tried something like this:

while read line
do
    date=awk -F "," '{print $4","$5}'
    var=grep "$date"
done

After I find an instance of the certain date, how can I select the number of seconds coresponding to it?

Thanks!

Upvotes: 1

Views: 198

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133518

Could you please try following awk command and let me know if this helps you. Will add non-one liner form of it too shortly.

awk -F, '{s=$4 " " $5; gsub(/[:-]/, " ", s); t=mktime(s); dt=strftime("%c", t); a[t]=dt; b[t]+=$6} END{for(i in a) print a[i] " has " b[i] " seconds"}'  Input_file

Thanks to Anubhava for correcting my code.

Upvotes: 2

anubhava
anubhava

Reputation: 785156

You can use this awk:

awk -F, '{cmd="date -d \"" $4 " " $5 "\""; cmd | getline dt; close(cmd); a[dt] += $6}
END{for (i in a) print i " has " a[i] " seconds"}' file

Sun Aug  3 19:00:09 EDT 2008 has 108 seconds
Sun Aug  3 19:00:20 EDT 2008 has 218 seconds
Sun Aug  3 19:00:01 EDT 2008 has 231 seconds

This awk command - uses comma as input field separator. - constructs a date string uses column 4th and 5th columns. - uses an associative array with key as date and value as sum of seconds value

Reference: Effective AWK Programming

If you want dates to be sorted then use awk + sort + cut as this one:

awk -F, '{s=$4 " " $5; cmd="date -d \"" s "\""; cmd | getline dt; close(cmd);
a[dt] += $6; b[dt]=s} END{for (i in a) print b[i] ";" i " has " a[i] " seconds"}' file |
sort -t ';' -k1,2 |
cut -d ';' -f2-

Sun Aug  3 19:00:01 EDT 2008 has 231 seconds
Sun Aug  3 19:00:09 EDT 2008 has 108 seconds
Sun Aug  3 19:00:20 EDT 2008 has 218 seconds

Upvotes: 4

Related Questions