Thanos
Thanos

Reputation: 586

awk: cannot open pipe Too many open files

I have a file with the following structure

#data 28-Sep-2020 16:48:04
#Version 1.1
#start
27-Sep-2020 16:00:22.00      83004     83.004784    uA               1
27-Sep-2020 16:01:22.00      82821     82.821602    uA               1
27-Sep-2020 16:02:22.00      82786     82.786552    uA               1
27-Sep-2020 16:03:22.00      82666     82.666336    uA               1
27-Sep-2020 16:04:22.00      82837     82.837242    uA               1
27-Sep-2020 16:05:22.00      82579     82.579857    uA               1
27-Sep-2020 16:06:22.00      82693     82.693413    uA               1
27-Sep-2020 16:08:22.00      82700     82.700043    uA               1
27-Sep-2020 16:09:22.00      82646     82.646797    uA               1
27-Sep-2020 16:10:22.00      82794     82.794540    uA               1
27-Sep-2020 16:11:22.00      82600     82.600845    uA               1
27-Sep-2020 16:12:22.00      82815     82.815422    uA               1
27-Sep-2020 16:13:22.00      82866     82.866974    uA               1

I'm trying to append in a file the first column in a %Y %-m %-d date format, the second one in a %-H %-M date format and lastly the 4th one as it:

2020 9 27     16 0     83.004784
2020 9 27     16 1     82.821602
2020 9 27     16 2     82.786552    
2020 9 27     16 3     82.666336
2020 9 27     16 4     82.837242
2020 9 27     16 5     82.579857
2020 9 27     16 6     82.693413
2020 9 27     16 7     82.700043
2020 9 27     16 8     82.646797
2020 9 27     16 9     82.794540
2020 9 27     16 10    83.004784
2020 9 27     16 11    82.600845
2020 9 27     16 12    82.815422
2020 9 27     16 13    82.866974

I thought of using getline and the date command so this is what I'm doing in an one-liner(I'm just splitting the command here for the sake of clarity) for the first column

$awk '{if(NR>=4)parsedate="date --date="$1" +\"%Y %-m %-d\""
                cmd | getline mydate
                close(parsedate);
       if(NR>=4 && NR<=10) print mydate, $4}' inputfile

and this is working fine and fast. When I'm trying to do the same for the second column using the following one-liner

$awk '{if(NR>=4)parsedate="date --date="$2" +\"%-H %-M\""
                cmd | getline mydate close(parsedate);
       if(NR>=4 && NR<=10) print mydate, $4}' inputfile

it's significantly slower (the input file is a large file so I think it's ignoring the if statements) and although it prints out what it's supposed to print (i.e. 16 0 83.004784 for the 4th line) it returns the following error

awk: cmd. line:1: (FILENAME=inputfile FNR=1023) fatal: cannot open pipe `date --date=08:59:22.00 +"%-H %-M"' (Too many open files)

What's strange to me is that I'm indeed using the close() command so I have no idea why it complains and only on the hour case.

Any ideas are more than welcome!

Upvotes: 1

Views: 703

Answers (1)

thanasisp
thanasisp

Reputation: 5975

First of all, the error is probably because of not calling close. But even after resolving that, if we make one call to system date for every log line, and usually logs have many lines, then we have an extremely slow script.

So it is mandatory to use the GNU awk time functions or even better, if requirements allow, like here, to use only string functions. Usually we just rearrange fields, with the help of split() or match(), but if there are months to convert to numbers, there is a standard way to do it.

awk 'NR>3{ split($1, dat, "-"); split($2, tim, ":")
     m=(index("JanFebMarAprMayJunJulAugSepOctNovDec", dat[2])+2)/3
     print dat[3], m, dat[1], tim[1], tim[2], $4 }' file

We define the string with all 3-letter months, and for any argument to convert, we get the index() where this substring begins, (Jan is 1st character, Feb 4, Mar 7 etc, so (i+2)/3 will give the month number.

Output:

2020 9 27 16 00 83.004784
2020 9 27 16 01 82.821602
2020 9 27 16 02 82.786552
2020 9 27 16 03 82.666336
2020 9 27 16 04 82.837242
2020 9 27 16 05 82.579857
2020 9 27 16 06 82.693413
2020 9 27 16 08 82.700043
2020 9 27 16 09 82.646797
2020 9 27 16 10 82.794540
2020 9 27 16 11 82.600845
2020 9 27 16 12 82.815422
2020 9 27 16 13 82.866974

So these are the data, you can use printf for any formatting you may want.

Upvotes: 3

Related Questions