TimeDelta
TimeDelta

Reputation: 411

/usr/bin/awk: Argument list too long with only 4 arguments

So I'm running a script I wrote for a language modeling task. On the "to_remove=" line, it errors out with "/usr/bin/awk: Argument list too long" even though there's only 4 arguments

my code:

echo "Removing n-grams that contain a word with count < $min_count"
counts=`cat combined_counts`
to_remove=`awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts`
for unigram in $to_remove; do
    counts=`echo "$counts" | egrep -v "\b$unigram\s"`
done
echo "$counts" > combined_counts

output:

Removing n-grams that contain a word with count < 3
/home/likewise-open/AD/bherman/new_decoder/language_model/scripts/create_lm: line 210: /usr/bin/awk: Argument list too long

I've also tried replacing the troublesome line with:

awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts > unigrams_to_remove

But it gives the same error and the unigrams_to_remove file is empty.

The weirdest part is that, when i run the same code from the command line immediately afterwards (meaning the combined_counts file is unchanged), it doesn't error out.

AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ min_count=3
AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ to_remove=`awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts`
AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ echo "$to_remove" | wc -l
15211

Upvotes: 1

Views: 2026

Answers (2)

ShellFish
ShellFish

Reputation: 4551

echo "Removing n-grams that contain a word with count < $min_count"
awk -v c=$min_count '( NF == 2 && $NF < c ) {print}' combined_counts | 
grep -Fvxf - combined_counts > tmp 
mv tmp combined_counts

Edit:

clearification

The awk statement takes the desired lines to remove and outputs them to a file called to_remove (rather than storing it in a huge array). The next line subtracts the lines from to_remove from combined_counts and stores it in a file called tmp. The last line cleans up the extra files and stores the desired output to the original file.

Yes I know the grep -Fvxf command is really cool and allows minimal and elegant scripting :D

Edit2: Cultivated the script further thanks to tripleee's comment!

Edit3: fixed typo in script: changed "{print}" to "{print $1}"

Edit4: fixed missing file argument for awk

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203985

The script you posted is 7 lines long, calls awk with 1 argument and CANNOT produce the error message you posted and you cannot reproduce the error running the isolated awk script from the command line.

The error message you posted:

/home/likewise-open/AD/bherman/new_decoder/language_model/scripts/create_lm: line 210: /usr/bin/awk: Argument list too long

is complaining about line 210 of some shell script that is invoking awk with too many arguments.

Therefore your problem is not with the script you showed us. If it's part of some larger script then look earlier in that script for a mismatched quote or something. You could start by commenting out parts of it until you can isolate the part that needs to exist for the error to be output.

Upvotes: 1

Related Questions