Reputation: 585
I have a csv data file that has two timestamp fields - start_time and end_time. They are strings in the form of "2014-02-01 00:06:22"
. Each line of the data file is a record with multiple fields. The file is pretty small.
I want to calculate the average duration among all records. Other than using shell scripts, is there any one-liner command that I could use for this kind of simple calculation, possibly using awk?
I'm very new to awk. Here's what I have but does not work. $6
and $7
are fields for start_time and end_time.
awk -F, 'BEGIN { count=0 total=0 }
{ sec1=date +%s -d $6 sec2=date +%s -d $7
total+=sec2-sec1 count++}
END {print "avg trip time: ", total/count}' dataset.csv
Sample of the csv file:
"start_time","stop_time","start station name","end station name","bike_id"
"2014-02-01 00:00:00","2014-02-01 00:06:22","Washington Square E","Stanton St & Chrystie St","21101"
Upvotes: 0
Views: 354
Reputation: 203334
Using GNU awk for mktime() and gensub():
$ cat tst.awk
BEGIN { FS="^\"|\",\"" }
function t2s(time) { return mktime(gensub(/[-:]/," ","g",time)) }
NR>1 { totDurs += (t2s($3) - t2s($2)) }
END { print totDurs / (NR-1) }
$ gawk -f tst.awk file
382
with other awks you need to call the shell date
function:
$ cat tst2.awk
BEGIN { FS="^\"|\",\"" }
function t2s(time, cmd,secs) {
cmd = "date +%s -d \"" time "\""
if ( (cmd | getline secs) <= 0 ) {
secs = -1
}
close(cmd)
return secs
}
NR>1 { totDurs += (t2s($3) - t2s($2)) }
END { print totDurs / (NR-1) }
$ awk -f tst2.awk file
382
Upvotes: 1