Reputation: 3337
I'm looking for a better way to manipulate a date format into something that I want. I do manage to do it, but I have to process the files several times because I can not get date
to do it in one pass.
The format I have:
Wed Jan 30 08:00:00 2019 : misc data
The format I want:
30/01/2019 08:00:00 : misc data
However, I am only able to get date
to process the date info if it is in the format:
30-Jan-2019 08:00:00 : misc data
(note: misc data
is a long string containing many unwieldy characters)
To achieve what I want I am using:
awk '{("date --date="$3"-"$2"-"$5"\\ "$4" +%F") | getline $1;$2="";$3="";$4;$5=""} 1' oldfile | tr -s ' ' > newfile
What this does is creates a format I can use, parses that into fields $1, clears fields 2, 3, and 5, prints it out (keeping the time in field 4, and misc data) and strips out the extra spaces left by the blank fields and saves it to a new file. I then I have to manipulate the format including the separators (because date
doesn't like /
if using a named month) into a new format and the whole process is becoming too complicated.
I then run another awk
over it swapping fields and separators around.
I'm sure this can be streamlined but it's starting too confuse me now.
I do realise I should be using the output format
of date
, but because there are slashes involved, as soon as I include single or double quotes, or try to escape them, I find that anything involving multiple format elements fails.
To make it worse, this all works when I work on a limited set of data (usually a sample limited by head
or tail
, but the original file is some 20,000 entries long and it fails at FNR=1043
with too many open files. It is only the one file open and one file saved. I think this is as a result of using getline
. Is there a way to do this without using it??
Upvotes: 1
Views: 156
Reputation: 8711
Another awk
$ echo 'Wed Jan 30 08:00:00 2019 : misc data' | awk -F: -v OFS=: ' { t=$NF;NF--;
cmd="date -d\047" $0 "\047 \047+%d/%m/%Y %T\047"; if ( (cmd | getline line) > 0 )
close(cmd); print line,t}'
30/01/2019 08:00:00: misc data
$
Upvotes: 0
Reputation: 203413
You don't need to call date
just to shuffle text around:
$ echo 'Wed Jan 30 08:00:00 2019 : misc data' |
awk '{
mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",$2)+2)/3
date = sprintf("%02d/%02d/%04d %s", $3, mthNr, $5, $4)
sub(/^([^ ]+ +){5}/,"")
print date, $0
}'
30/01/2019 08:00:00 : misc data
The too many open files
error you got btw is because you aren't closing the pipe after every invocation of getline. See http://awk.freeshell.org/AllAboutGetline for when and how to use getline robustly.
Upvotes: 3