Shreyash
Shreyash

Reputation: 456

Find time difference in milliseconds awk or gawk

I have file with huge data into following format i want to find out the difference between two timestamp in millisecond and add a column at the last with time difference of that row

22159   | a | 2021-02-26 11:02:03.776 | 2021-02-26 11:02:04.740 <br/>
22160   | b | 2021-02-26 11:35:21.796 | 2021-02-26 11:35:22.674 <br/>
22161   | c | 2021-02-26 11:35:21.806 | 2021-02-26 11:35:22.841 <br/>
22161   | d | 2021-02-26 11:02:18.688 | 2021-02-26 11:02:19.594 <br/>
22182   | e | 2021-02-26 11:06:02.978 | 2021-02-26 11:06:03.815 <br/>
22183   | f | 2021-02-26 11:35:24.911 | 2021-02-26 11:35:25.791 <br/>
22184   | g | 2021-02-26 11:35:25.082 | 2021-02-26 11:35:26.121 <br/>
22199   | h | 2021-02-26 11:09:47.815 | 2021-02-26 11:09:48.499 <br/>
22200   | i | 2021-02-26 11:35:27.562 | 2021-02-26 11:35:28.660 <br/>
22200   | j | 2021-02-26 11:09:49.595 | 2021-02-26 11:09:50.596 <br/>

Output eg.

9535 a 2021-02-27 11:02:53.756 2021-02-27 11:02:53.947 0.191

I have tied below command:

awk -F'|' 'function convert(t) {   cmd = "date +%s.%3N -d \""t"\" "; cmd|getline timemilli; return timemilli; } { t2=convert($4);t1=convert($3);printf $1"\t"$2"\t"$3"\t"$4"\t%.3f\n",t2-t1 }' filtered_data

Its working fine for small file but its giving me error for huge files

Error:

awk: cmd. line:1: (FILENAME=filtered_data FNR=516) fatal: cannot open pipe date +%s.%3N -d "2021-02-27 11:24:05.618" ' (Too many open files)`

Note: my file is around 10 mb and i want to find difference in single shot. not looping over each line.

Is there any other way to do this in single shot?

Upvotes: 1

Views: 1511

Answers (3)

anubhava
anubhava

Reputation: 785246

You may try this awk:

awk -F'|' '
function convert(t,  cmd, timemilli) {
   cmd = "date +%s.%3N -d \""t"\" "
   cmd | getline timemilli
   close (cmd)  # close this cmd to avoid too many open files
   return timemilli
}
{
   t2=convert($4)
   t1=convert($3)
   printf "%s\t%.3f\n", $1"\t"$2"\t"$3"\t"$4, t2-t1
}' filtered_data

Upvotes: 3

urznow
urznow

Reputation: 1801

For mawk 1.3.4. See mawk documentation on mktime for the value of daylight savings time dst.

LC_ALL=C awk -v FS=' +\| ' -v OFS='|' -v RS=' <br/>\n' -v OFMT='%.3f' -v dst='-1' '
function fn(s) {
    # return epoch seconds, with milliseconds as fraction
    gsub(/-|:/," ",s)
    return mktime(substr(s,1,19) " " dst) + substr(s,21,3)/1000
}{ $(NF+1) = fn($4) - fn($3) }
1
' data

EDIT Edited FS and RS to match OP's new input format.

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Using GNU awk since mktime() and gensub():

$ gawk '
BEGIN {
    FS=" [|] "
}
function s(dt) {             # function to deal with dt conversion
    return sprintf("%.3f\n", mktime(gensub(/[- :.]/," ","g",dt))+gensub(/^[^.]+/,"","g",dt))
}
{
    $1=$1                    # rebuild the record for requested output
    print $0,s($3)-s($4)     # output
}' file

Output:

22159 a 2021-02-26 11:02:03.776 2021-02-26 11:02:04.740 -0.964
22160 b 2021-02-26 11:35:21.796 2021-02-26 11:35:22.674 -0.878
22161 c 2021-02-26 11:35:21.806 2021-02-26 11:35:22.841 -1.035
...

Notice: mktime() requires time in form "YYYY MM DD HH MM SS [DST]". Above I'm abusing (due to laziness) and feeding it form "YYYY MM DD HH MM SS sss" where sss is the milliseconds. It seems to work but you may want to fix that with for example substr() or similar.

Upvotes: 2

Related Questions