Reputation: 456
I have file with huge data into following format i want to find out the difference between two timestamp in millisecond and add a column at the last with time difference of that row
22159 | a | 2021-02-26 11:02:03.776 | 2021-02-26 11:02:04.740 <br/>
22160 | b | 2021-02-26 11:35:21.796 | 2021-02-26 11:35:22.674 <br/>
22161 | c | 2021-02-26 11:35:21.806 | 2021-02-26 11:35:22.841 <br/>
22161 | d | 2021-02-26 11:02:18.688 | 2021-02-26 11:02:19.594 <br/>
22182 | e | 2021-02-26 11:06:02.978 | 2021-02-26 11:06:03.815 <br/>
22183 | f | 2021-02-26 11:35:24.911 | 2021-02-26 11:35:25.791 <br/>
22184 | g | 2021-02-26 11:35:25.082 | 2021-02-26 11:35:26.121 <br/>
22199 | h | 2021-02-26 11:09:47.815 | 2021-02-26 11:09:48.499 <br/>
22200 | i | 2021-02-26 11:35:27.562 | 2021-02-26 11:35:28.660 <br/>
22200 | j | 2021-02-26 11:09:49.595 | 2021-02-26 11:09:50.596 <br/>
Output eg.
9535 a 2021-02-27 11:02:53.756 2021-02-27 11:02:53.947 0.191
I have tied below command:
awk -F'|' 'function convert(t) { cmd = "date +%s.%3N -d \""t"\" "; cmd|getline timemilli; return timemilli; } { t2=convert($4);t1=convert($3);printf $1"\t"$2"\t"$3"\t"$4"\t%.3f\n",t2-t1 }' filtered_data
Its working fine for small file but its giving me error for huge files
Error:
awk: cmd. line:1: (FILENAME=filtered_data FNR=516) fatal: cannot open pipe
date +%s.%3N -d "2021-02-27 11:24:05.618" ' (Too many open files)`
Note: my file is around 10 mb and i want to find difference in single shot. not looping over each line.
Is there any other way to do this in single shot?
Upvotes: 1
Views: 1511
Reputation: 785246
You may try this awk
:
awk -F'|' '
function convert(t, cmd, timemilli) {
cmd = "date +%s.%3N -d \""t"\" "
cmd | getline timemilli
close (cmd) # close this cmd to avoid too many open files
return timemilli
}
{
t2=convert($4)
t1=convert($3)
printf "%s\t%.3f\n", $1"\t"$2"\t"$3"\t"$4, t2-t1
}' filtered_data
Upvotes: 3
Reputation: 1801
For mawk
1.3.4. See mawk documentation on mktime
for the value of
daylight savings time dst
.
LC_ALL=C awk -v FS=' +\| ' -v OFS='|' -v RS=' <br/>\n' -v OFMT='%.3f' -v dst='-1' '
function fn(s) {
# return epoch seconds, with milliseconds as fraction
gsub(/-|:/," ",s)
return mktime(substr(s,1,19) " " dst) + substr(s,21,3)/1000
}{ $(NF+1) = fn($4) - fn($3) }
1
' data
EDIT Edited FS
and RS
to match OP's new input format.
Upvotes: 1
Reputation: 37404
Using GNU awk since mktime()
and gensub()
:
$ gawk '
BEGIN {
FS=" [|] "
}
function s(dt) { # function to deal with dt conversion
return sprintf("%.3f\n", mktime(gensub(/[- :.]/," ","g",dt))+gensub(/^[^.]+/,"","g",dt))
}
{
$1=$1 # rebuild the record for requested output
print $0,s($3)-s($4) # output
}' file
Output:
22159 a 2021-02-26 11:02:03.776 2021-02-26 11:02:04.740 -0.964
22160 b 2021-02-26 11:35:21.796 2021-02-26 11:35:22.674 -0.878
22161 c 2021-02-26 11:35:21.806 2021-02-26 11:35:22.841 -1.035
...
Notice: mktime()
requires time in form "YYYY MM DD HH MM SS [DST]"
. Above I'm abusing (due to laziness) and feeding it form "YYYY MM DD HH MM SS sss"
where sss
is the milliseconds. It seems to work but you may want to fix that with for example substr()
or similar.
Upvotes: 2