Reputation: 902
I'm working on an AWK script that parses millions of lines of text. Each line contains (among other things) a date & time on the form:
16-FEB-2008 14:17:59.994669
I need to convert this into the following form
20080216141759994669000
And I would like avoid translating the month from text into a numerical value manually if it's possible. In bash I can simply do the following command to get the desired result:
date -d "16-FEB-2008 14:17:59.994669" +"%Y%m%d%H%M%S%N"
I have tried invoking this command into AWK but I cannot figure out howto. I would like to know
Thanks in advance
Upvotes: 4
Views: 1573
Reputation: 158070
You can call an external command like this:
awk '{
cmd="date -d \""$0"\" +%Y%m%d%H%M%S%N"
cmd | getline ts
print $0, ts
# awk opened a pipe for the communication with
# the command. close that pipe to avoid running
# out of file descriptors
close(cmd)
}' <<< '16-FEB-2008 14:17:59.994669'
Output:
16-FEB-2008 14:17:59.994669 20080216141759994669000
Thanks to dave_thompson_085's comment you can significantly improve the performance if you have date
from GNU coreutils and gawk
. GNU's date
supports reading dates from stdin and gawk
supports co-processes which allows to start a single instance of date
in the background, write into it's stdin and read from stdout:
{
cmd = "stdbuf -oL date -f /dev/stdin +%Y%m%d%H%M%S%N"
print $0 |& cmd
cmd |& getline ts
print $0, ts
}
Note that you need to use the stdbuf command in addition to force date
to output the results line by line.
Upvotes: 4
Reputation: 439058
Many good answers here. Here's one that uses an awk
helper function to reformat the dates.
awk '
BEGIN {
mi["JAN"]="01"; mi["FEB"]="02"; mi["MAR"]="03"; mi["APR"]="04"; mi["MAY"]="05"; mi["JUN"]="06"
mi["JUL"]="07"; mi["AUG"]="08"; mi["SEP"]="09"; mi["OCT"]="10"; mi["NOV"]="11"; mi["DEC"]="12"
}
function reformatDate(dtStr, tmStr) {
split(dtStr, dtParts, "-"); gsub(/[:.]/, "", tmStr)
return dtParts[3] mi[dtParts[2]] sprintf("%02d", dtParts[1]) tmStr "000"
}
{ print reformatDate($1, $2) }
' <<<'16-FEB-2008 14:17:59.994669'
Upvotes: 2
Reputation: 67497
here there is no need to invoke date, you just need a month lookup
$ awk -F'[- :.]' -v OFS='' '
BEGIN {split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",m);
for(i=1;i<=12;i++) a[m[i]]=i<10?"0"i:i}
{$2=a[$2]; y=$3; $3=$1; $1=y; print $0 "000"}' file
Upvotes: 1
Reputation: 38990
Converting month names to numbers in awk is easy, and so is the reformatting as long as you don't need the (additional) validation date
does 'for free':
$ echo this 16-FEB-2008 14:17:59.994669 that \
> | awk '{ split($2,d,"-"); split($3,t,"[:.]");
m=sprintf("%02d",index("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",d[2])/4+1);
print $1,d[3] m d[1] t[1] t[2] t[3] t[4] "000",$4 }'
this 20080216141759994669000 that
$ # or can put the script in a file and use with awk -f
$ # or the whole thing in a shebang file like #!/bin/awk -f
This is not much longer than the code to run date
and much more efficient for 'millions of lines'.
Upvotes: 5
Reputation: 63952
Using perl:
LANG=C perl -MTime::Piece -plE 's/\b(\d+-\w{3}-\d{4}\s+\d+:\d+:\d+)\.(\d+)\b/Time::Piece->strptime($1,q{%d-%b-%Y %H:%M:%S})->strftime(q{%Y%m%d%H%M%S}).$2/ge' < in >out
Replaces each time-like pattern globally with the re-formatted (and validated) date.
The core module Time::Piece doesn't support fractional seconds, so a solution is a bit hackish...
Upvotes: 2