Reputation: 1695
Let's say I have a file like the following:
1,aaa,2016-12-01 01:02:03 EST,bbb
2,ccc,2016-12-02 04:05:06 CST,ddd
3,eee,2016-12-03 07:08:09 EST,fff
I want to add a 5th field, which is the timestamp in field 3, but converted to UTC.
This can invoke a call to date
or whatever within Linux. I'm not too worried about performance, as it is called only once very so often on a small amount of files.
I just can't seem to figure out the best way to do this. awk's strftime
doesn't receive a timezone field, so I can't figure out the best way to use it.
Upvotes: 0
Views: 258
Reputation: 26471
Using GNU awk, you can do this by manipulating the TZ
environment variable. The format is a bit nasty when you are dealing with time-offsets only, but if you have the std-string that specifies the time-zone, it is straightforward.
In awk, the environment is stored in the array ENVIRON
. Modificaion of ENVIRON
is implementation defined:
ENVIRON
: An array representing the value of the environment, as described in the exec functions defined in the System Interfaces volume of POSIX.1-2017. The indices of the array shall be strings consisting of the names of the environment variables, and the value of each array element shall be a string consisting of the value of that variable. If appropriate, the environment variable shall be considered a numeric string (see Expressions in awk); the array element shall also have its numeric value.In all cases where the behaviour of awk is affected by environment variables (including the environment of any commands that awk executes via the system function or via pipeline redirections with the print statement, the
printf
statement, or thegetline
function), the environment used shall be the environment at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects this environment.source: POSIX.1-2017
GNU awk, on the other hand, states the following:
However, beginning with version 4.2, if not in POSIX compatibility mode, gawk does update its own environment when
ENVIRON
is changed, thus changing the environment seen by programs that it creates.
So this can now be exploited by doing somethinglike this:something like:
ENVIRON["TZ"] = std offset
Here are a couple of examples:
ENVIRON["TZ"] = "UTC"
ENVIRON["TZ"] = "UTC+03:00"
ENVIRON["TZ"] = "CET"
With respect to the OP, we can do this:
awk 'BEGIN{FS=OFS=","}
{time=$2; gsub(/[^0-9]/," ",time); tz=$2; gsub(/^.* /,"",tz)}
{ENVIRON["TZ"]=tz; print $0,strftime("%F %T",mktime(time),1)}
' file
Note: for CSV files, you should use What's the most robust way to efficiently parse CSV using awk?
Note: sadly, not all known forms of TZ definitions are recognized.
Upvotes: 0
Reputation: 52122
With a Bash loop:
while IFS=, read -r -a linearr; do
printf '%s,' "${linearr[@]}"
date +'%F %T %Z' -u -d "${linearr[2]}"
done < infile
This results in
1,aaa,2016-12-01 01:02:03 EST,bbb,2016-12-01 06:02:03 UTC
2,ccc,2016-12-02 04:05:06 CST,ddd,2016-12-02 10:05:06 UTC
3,eee,2016-12-03 07:08:09 EST,fff,2016-12-03 12:08:09 UTC
This reads each line into an array linearr
, prints the line with an added comma at the end and then appends the new date string.
Alternatively, with paste
and cut
:
paste -d, infile <(while read line; do date +'%F %T %Z' -u -d "$line"; done < <(cut -d , -f 3 infile))
or, more readable:
paste -d , \
infile \
<(while read line; do
date +'%F %T %Z' -u -d "$line"
done < <(cut -d, -f3 infile)
)
Notice that the -d
option to date
is specific to GNU date
. POSIX date
has no option to use a date other than the current system date, and date
in FreeBSD uses another option, -r
, which expects "seconds since the Epoch" as its argument.
Upvotes: 2