error in awk: "cannot open - too many open files"

Question

I need to split a huge file (about 4 million lines) in subfiles based on a pattern.

I always use awk to do that and works perfectly in files until about a hundred thousand lines. Files bigger than that are returning the following error:

awk: cannot open "filename" for output (Too many open files)

Here the command line that I'm using:

awk '{OFS="	"; print $1,$2,$3,$4,$12 > $10"_"$8.txt"}' mybigfile.txt

In $10 there are about 4 or 5 thousand different patterns in which I need to split into.

How can I overcome this error? Where should I insert the close command? (I'm using the awk in the Ubuntu distribution.)

Ed Morton · Accepted Answer

Copy/paste exactly this command and it will work:

awk 'BEGIN{OFS="	"} {out=$10"_"$8".txt"; print $1,$2,$3,$4,$12 >> out; close(out)}' mybigfile.txt

You've been experiencing 2 problems:

1) You're using an awk that is not GNU awk and so doesn't close files for you when needed, and

2) You're re-typing the commands people are suggesting you use instead of copy-pasting them and messing up the quotes when you do so, just like in the script in your question.

If you can use gawk then it'd simply be:

awk 'BEGIN{OFS="	"} {print $1,$2,$3,$4,$12 > ($10"_"$8".txt")}' mybigfile.txt

Unlike with several other awks you don't technically need to parenthesize the expression on the right side of output redirection with gawk but it's a good habit to get into for portability and helps readability.

error in awk: "cannot open - too many open files"

Answers (2)

Related Questions

error in awk: &quot;cannot open - too many open files&quot;

Answers (2)

Related Questions

error in awk: "cannot open - too many open files"