jasonmclose
jasonmclose

Reputation: 1695

How to change timestamps via timezone for a field in a delimited file in Linux?

Let's say I have a file like the following:

1,aaa,2016-12-01 01:02:03 EST,bbb
2,ccc,2016-12-02 04:05:06 CST,ddd
3,eee,2016-12-03 07:08:09 EST,fff

I want to add a 5th field, which is the timestamp in field 3, but converted to UTC.

This can invoke a call to date or whatever within Linux. I'm not too worried about performance, as it is called only once very so often on a small amount of files.

I just can't seem to figure out the best way to do this. awk's strftime doesn't receive a timezone field, so I can't figure out the best way to use it.

Upvotes: 0

Views: 258

Answers (2)

kvantour
kvantour

Reputation: 26471

Using GNU awk, you can do this by manipulating the TZ environment variable. The format is a bit nasty when you are dealing with time-offsets only, but if you have the std-string that specifies the time-zone, it is straightforward.

In awk, the environment is stored in the array ENVIRON. Modificaion of ENVIRON is implementation defined:

ENVIRON: An array representing the value of the environment, as described in the exec functions defined in the System Interfaces volume of POSIX.1-2017. The indices of the array shall be strings consisting of the names of the environment variables, and the value of each array element shall be a string consisting of the value of that variable. If appropriate, the environment variable shall be considered a numeric string (see Expressions in awk); the array element shall also have its numeric value.

In all cases where the behaviour of awk is affected by environment variables (including the environment of any commands that awk executes via the system function or via pipeline redirections with the print statement, the printf statement, or the getline function), the environment used shall be the environment at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects this environment.

source: POSIX.1-2017

GNU awk, on the other hand, states the following:

However, beginning with version 4.2, if not in POSIX compatibility mode, gawk does update its own environment when ENVIRON is changed, thus changing the environment seen by programs that it creates.

So this can now be exploited by doing somethinglike this:something like:

ENVIRON["TZ"] = std offset

Here are a couple of examples:

ENVIRON["TZ"] = "UTC"
ENVIRON["TZ"] = "UTC+03:00"
ENVIRON["TZ"] = "CET"

With respect to the OP, we can do this:

awk 'BEGIN{FS=OFS=","}
     {time=$2; gsub(/[^0-9]/," ",time); tz=$2; gsub(/^.* /,"",tz)}
     {ENVIRON["TZ"]=tz; print $0,strftime("%F %T",mktime(time),1)}
    ' file

Note: for CSV files, you should use What's the most robust way to efficiently parse CSV using awk?

Note: sadly, not all known forms of TZ definitions are recognized.

Upvotes: 0

Benjamin W.
Benjamin W.

Reputation: 52122

With a Bash loop:

while IFS=, read -r -a linearr; do
    printf '%s,' "${linearr[@]}"
    date +'%F %T %Z' -u -d "${linearr[2]}"
done < infile

This results in

1,aaa,2016-12-01 01:02:03 EST,bbb,2016-12-01 06:02:03 UTC
2,ccc,2016-12-02 04:05:06 CST,ddd,2016-12-02 10:05:06 UTC
3,eee,2016-12-03 07:08:09 EST,fff,2016-12-03 12:08:09 UTC

This reads each line into an array linearr, prints the line with an added comma at the end and then appends the new date string.

Alternatively, with paste and cut:

paste -d, infile <(while read line; do date +'%F %T %Z' -u -d "$line"; done < <(cut -d , -f 3 infile))

or, more readable:

paste -d , \
    infile \
    <(while read line; do
          date +'%F %T %Z' -u -d "$line"
      done < <(cut -d, -f3 infile)
    )

Notice that the -d option to date is specific to GNU date. POSIX date has no option to use a date other than the current system date, and date in FreeBSD uses another option, -r, which expects "seconds since the Epoch" as its argument.

Upvotes: 2

Related Questions