Juliano Boquett
Juliano Boquett

Reputation: 23

error in awk: "cannot open - too many open files"

I need to split a huge file (about 4 million lines) in subfiles based on a pattern.

I always use awk to do that and works perfectly in files until about a hundred thousand lines. Files bigger than that are returning the following error:

awk: cannot open "filename" for output (Too many open files)

Here the command line that I'm using:

awk '{OFS="\t"; print $1,$2,$3,$4,$12 > $10"_"$8.txt"}' mybigfile.txt

In $10 there are about 4 or 5 thousand different patterns in which I need to split into.

How can I overcome this error? Where should I insert the close command? (I'm using the awk in the Ubuntu distribution.)

Upvotes: 2

Views: 1518

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133610

So whenever you are creating a new file by values of $10, $8 then it will write the lines into it but in backend since awk program is still running it will not close those files and which will cause the limit of open files by this awk program thus we have to close those files.

Kindly try following and let me know if this helps you.

awk 'BEGIN{OFS="\t";} {if(prev){close(prev)};print $1,$2,$3,$4,$12 >> ($10"_"$8".txt");prev=$10"_"$8".txt"}' mybigfile.txt

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203985

Copy/paste exactly this command and it will work:

awk 'BEGIN{OFS="\t"} {out=$10"_"$8".txt"; print $1,$2,$3,$4,$12 >> out; close(out)}' mybigfile.txt

You've been experiencing 2 problems:

1) You're using an awk that is not GNU awk and so doesn't close files for you when needed, and

2) You're re-typing the commands people are suggesting you use instead of copy-pasting them and messing up the quotes when you do so, just like in the script in your question.

If you can use gawk then it'd simply be:

awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$12 > ($10"_"$8".txt")}' mybigfile.txt

Unlike with several other awks you don't technically need to parenthesize the expression on the right side of output redirection with gawk but it's a good habit to get into for portability and helps readability.

Upvotes: 2

Related Questions