DoubleTrouble
DoubleTrouble

Reputation: 902

How to format a string date (with text and milliseconds) using AWK

I'm working on an AWK script that parses millions of lines of text. Each line contains (among other things) a date & time on the form:

16-FEB-2008 14:17:59.994669

I need to convert this into the following form

20080216141759994669000

And I would like avoid translating the month from text into a numerical value manually if it's possible. In bash I can simply do the following command to get the desired result:

date -d "16-FEB-2008 14:17:59.994669" +"%Y%m%d%H%M%S%N"

I have tried invoking this command into AWK but I cannot figure out howto. I would like to know

  1. Is it possible to achieve with AWK alone?
  2. How can I use such a command in an AWK script file?

Thanks in advance

Upvotes: 4

Views: 1573

Answers (5)

hek2mgl
hek2mgl

Reputation: 158070

You can call an external command like this:

awk '{
         cmd="date -d \""$0"\" +%Y%m%d%H%M%S%N"
         cmd | getline ts
         print $0, ts
         # awk opened a pipe for the communication with 
         # the command. close that pipe to avoid running
         # out of file descriptors
         close(cmd)
     }' <<< '16-FEB-2008 14:17:59.994669'

Output:

16-FEB-2008 14:17:59.994669 20080216141759994669000

Thanks to dave_thompson_085's comment you can significantly improve the performance if you have date from GNU coreutils and gawk. GNU's date supports reading dates from stdin and gawk supports co-processes which allows to start a single instance of date in the background, write into it's stdin and read from stdout:

{
    cmd = "stdbuf -oL date -f /dev/stdin +%Y%m%d%H%M%S%N"
    print $0 |& cmd 
    cmd |& getline ts
    print $0, ts
}

Note that you need to use the stdbuf command in addition to force date to output the results line by line.

Upvotes: 4

mklement0
mklement0

Reputation: 439058

Many good answers here. Here's one that uses an awk helper function to reformat the dates.

awk '
  BEGIN { 
    mi["JAN"]="01"; mi["FEB"]="02"; mi["MAR"]="03"; mi["APR"]="04"; mi["MAY"]="05"; mi["JUN"]="06"
    mi["JUL"]="07"; mi["AUG"]="08"; mi["SEP"]="09"; mi["OCT"]="10"; mi["NOV"]="11"; mi["DEC"]="12"
  }
  function reformatDate(dtStr, tmStr) {
    split(dtStr, dtParts, "-"); gsub(/[:.]/, "", tmStr)
    return dtParts[3] mi[dtParts[2]] sprintf("%02d", dtParts[1]) tmStr "000"
  }
  { print reformatDate($1, $2) }
' <<<'16-FEB-2008 14:17:59.994669'

Upvotes: 2

karakfa
karakfa

Reputation: 67497

here there is no need to invoke date, you just need a month lookup

$ awk -F'[- :.]' -v OFS='' '
     BEGIN {split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",m);
            for(i=1;i<=12;i++) a[m[i]]=i<10?"0"i:i}
           {$2=a[$2]; y=$3; $3=$1; $1=y; print $0 "000"}' file

Upvotes: 1

dave_thompson_085
dave_thompson_085

Reputation: 38990

Converting month names to numbers in awk is easy, and so is the reformatting as long as you don't need the (additional) validation date does 'for free':

$ echo this 16-FEB-2008 14:17:59.994669 that \
> | awk '{ split($2,d,"-"); split($3,t,"[:.]"); 
    m=sprintf("%02d",index("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC",d[2])/4+1);
    print $1,d[3] m d[1] t[1] t[2] t[3] t[4] "000",$4 }'
this 20080216141759994669000 that
$ # or can put the script in a file and use with awk -f
$ # or the whole thing in a shebang file like #!/bin/awk -f

This is not much longer than the code to run date and much more efficient for 'millions of lines'.

Upvotes: 5

clt60
clt60

Reputation: 63952

Using perl:

LANG=C perl -MTime::Piece -plE 's/\b(\d+-\w{3}-\d{4}\s+\d+:\d+:\d+)\.(\d+)\b/Time::Piece->strptime($1,q{%d-%b-%Y %H:%M:%S})->strftime(q{%Y%m%d%H%M%S}).$2/ge' < in >out

Replaces each time-like pattern globally with the re-formatted (and validated) date.

The core module Time::Piece doesn't support fractional seconds, so a solution is a bit hackish...

Upvotes: 2

Related Questions