Reputation: 586
I have a file with the following structure
#data 28-Sep-2020 16:48:04
#Version 1.1
#start
27-Sep-2020 16:00:22.00 83004 83.004784 uA 1
27-Sep-2020 16:01:22.00 82821 82.821602 uA 1
27-Sep-2020 16:02:22.00 82786 82.786552 uA 1
27-Sep-2020 16:03:22.00 82666 82.666336 uA 1
27-Sep-2020 16:04:22.00 82837 82.837242 uA 1
27-Sep-2020 16:05:22.00 82579 82.579857 uA 1
27-Sep-2020 16:06:22.00 82693 82.693413 uA 1
27-Sep-2020 16:08:22.00 82700 82.700043 uA 1
27-Sep-2020 16:09:22.00 82646 82.646797 uA 1
27-Sep-2020 16:10:22.00 82794 82.794540 uA 1
27-Sep-2020 16:11:22.00 82600 82.600845 uA 1
27-Sep-2020 16:12:22.00 82815 82.815422 uA 1
27-Sep-2020 16:13:22.00 82866 82.866974 uA 1
I'm trying to append in a file the first column in a %Y %-m %-d
date format, the second one in a %-H %-M
date format and lastly the 4th one as it:
2020 9 27 16 0 83.004784
2020 9 27 16 1 82.821602
2020 9 27 16 2 82.786552
2020 9 27 16 3 82.666336
2020 9 27 16 4 82.837242
2020 9 27 16 5 82.579857
2020 9 27 16 6 82.693413
2020 9 27 16 7 82.700043
2020 9 27 16 8 82.646797
2020 9 27 16 9 82.794540
2020 9 27 16 10 83.004784
2020 9 27 16 11 82.600845
2020 9 27 16 12 82.815422
2020 9 27 16 13 82.866974
I thought of using getline
and the date
command so this is what I'm doing in an one-liner(I'm just splitting the command here for the sake of clarity) for the first column
$awk '{if(NR>=4)parsedate="date --date="$1" +\"%Y %-m %-d\""
cmd | getline mydate
close(parsedate);
if(NR>=4 && NR<=10) print mydate, $4}' inputfile
and this is working fine and fast. When I'm trying to do the same for the second column using the following one-liner
$awk '{if(NR>=4)parsedate="date --date="$2" +\"%-H %-M\""
cmd | getline mydate close(parsedate);
if(NR>=4 && NR<=10) print mydate, $4}' inputfile
it's significantly slower (the input file is a large file so I think it's ignoring the if
statements) and although it prints out what it's supposed to print (i.e. 16 0 83.004784
for the 4th line) it returns the following error
awk: cmd. line:1: (FILENAME=inputfile FNR=1023) fatal: cannot open pipe `date --date=08:59:22.00 +"%-H %-M"' (Too many open files)
What's strange to me is that I'm indeed using the close()
command so I have no idea why it complains and only on the hour case.
Any ideas are more than welcome!
Upvotes: 1
Views: 703
Reputation: 5975
First of all, the error is probably because of not calling close
. But even after resolving that, if we make one call to system date
for every log line, and usually logs have many lines, then we have an extremely slow script.
So it is mandatory to use the GNU awk time functions or even better, if requirements allow, like here, to use only string functions. Usually we just rearrange fields, with the help of split()
or match()
, but if there are months to convert to numbers, there is a standard way to do it.
awk 'NR>3{ split($1, dat, "-"); split($2, tim, ":")
m=(index("JanFebMarAprMayJunJulAugSepOctNovDec", dat[2])+2)/3
print dat[3], m, dat[1], tim[1], tim[2], $4 }' file
We define the string with all 3-letter months, and for any argument to convert, we get the index()
where this substring begins, (Jan
is 1st character, Feb
4, Mar
7 etc, so (i+2)/3
will give the month number.
Output:
2020 9 27 16 00 83.004784
2020 9 27 16 01 82.821602
2020 9 27 16 02 82.786552
2020 9 27 16 03 82.666336
2020 9 27 16 04 82.837242
2020 9 27 16 05 82.579857
2020 9 27 16 06 82.693413
2020 9 27 16 08 82.700043
2020 9 27 16 09 82.646797
2020 9 27 16 10 82.794540
2020 9 27 16 11 82.600845
2020 9 27 16 12 82.815422
2020 9 27 16 13 82.866974
So these are the data, you can use printf
for any formatting you may want.
Upvotes: 3