Reputation: 411
So I'm running a script I wrote for a language modeling task. On the "to_remove=" line, it errors out with "/usr/bin/awk: Argument list too long" even though there's only 4 arguments
my code:
echo "Removing n-grams that contain a word with count < $min_count"
counts=`cat combined_counts`
to_remove=`awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts`
for unigram in $to_remove; do
counts=`echo "$counts" | egrep -v "\b$unigram\s"`
done
echo "$counts" > combined_counts
output:
Removing n-grams that contain a word with count < 3
/home/likewise-open/AD/bherman/new_decoder/language_model/scripts/create_lm: line 210: /usr/bin/awk: Argument list too long
I've also tried replacing the troublesome line with:
awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts > unigrams_to_remove
But it gives the same error and the unigrams_to_remove file is empty.
The weirdest part is that, when i run the same code from the command line immediately afterwards (meaning the combined_counts file is unchanged), it doesn't error out.
AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ min_count=3
AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ to_remove=`awk -v c=$min_count '( NF == 2 && $NF < c ) {print $1}' combined_counts`
AD\bherman@cluster4:~/new_decoder/language_model/working/filter_tests
$ echo "$to_remove" | wc -l
15211
Upvotes: 1
Views: 2026
Reputation: 4551
echo "Removing n-grams that contain a word with count < $min_count"
awk -v c=$min_count '( NF == 2 && $NF < c ) {print}' combined_counts |
grep -Fvxf - combined_counts > tmp
mv tmp combined_counts
Edit:
clearification
The awk statement takes the desired lines to remove and outputs them to a file called to_remove (rather than storing it in a huge array). The next line subtracts the lines from to_remove from combined_counts and stores it in a file called tmp. The last line cleans up the extra files and stores the desired output to the original file.
Yes I know the grep -Fvxf
command is really cool and allows minimal and elegant scripting :D
Edit2: Cultivated the script further thanks to tripleee's comment!
Edit3: fixed typo in script: changed "{print}" to "{print $1}"
Edit4: fixed missing file argument for awk
Upvotes: 3
Reputation: 203985
The script you posted is 7 lines long, calls awk with 1 argument and CANNOT produce the error message you posted and you cannot reproduce the error running the isolated awk script from the command line.
The error message you posted:
/home/likewise-open/AD/bherman/new_decoder/language_model/scripts/create_lm: line 210: /usr/bin/awk: Argument list too long
is complaining about line 210 of some shell script that is invoking awk with too many arguments.
Therefore your problem is not with the script you showed us. If it's part of some larger script then look earlier in that script for a mismatched quote or something. You could start by commenting out parts of it until you can isolate the part that needs to exist for the error to be output.
Upvotes: 1