Markus Heller
Markus Heller

Reputation: 174

Pass argument to awk inside do loop

I have a large number of tab-separated text files containing a score I'm interested in in the second column:

test_score_1.txt

Title   FRED Chemgauss4 File
24937   -6.111582   A
24972   -7.644171   A
26246   -8.551361   A
21453   -7.291059   A

test_score_2.txt

Title   FRED Chemgauss4 File
14721   -7.322331   B
27280   -6.229842   B
21451   -8.407396   B
10035   -7.482369   B
10037   -7.706176   B

I want to check if I have Titles with a score smaller than a number I define.

The following code defines my score in the script and works:

check_score_1

#!/bin/bash

find . -name 'test_score_*.txt' -type f -print0 |
while read -r -d $'\0' x; do
    awk '{FS = "\t" ; if ($2 < -7.5) print $0}' "$x"
done

If I try to pass an argument to awk like so check_scores_2.sh "-7.5" as shown in check_score_2.sh, that returns all entries from both files.

check_scores_2.sh

#!/bin/bash

find . -name 'test_score_*.txt' -type f -print0 |
while read -r -d $'\0' x; do
    awk '{FS = "\t" ; if ($2 < ARGV[1]) print $0}' "$x"
done

Finally, check_scores_3.sh reveals that I'm actually not passing any arguments from my command line.

check_scores_3.sh

#!/bin/bash

find . -name 'test_score_*.txt' -type f -print0 |
while read -r -d $'\0' x; do
    awk '{print ARGV[0] "\t" ARGV[1] "\t" ARGV[2]}' "$x"
done

$ ./check_score_3.sh "-7.5" gives the following output:

awk ./test_score_1.txt  
awk ./test_score_1.txt  
awk ./test_score_1.txt  
awk ./test_score_1.txt  
awk ./test_score_1.txt  
awk ./test_score_2.txt  
awk ./test_score_2.txt  
awk ./test_score_2.txt  
awk ./test_score_2.txt  
awk ./test_score_2.txt  
awk ./test_score_2.txt  

What am I doing wrong?

Upvotes: 3

Views: 721

Answers (2)

Ed Morton
Ed Morton

Reputation: 203209

Your first example:

awk '{FS = "\t" ; if ($2 < -7.5) print $0}' "$x"

only works by a happy coincidence that setting FS actually makes no difference for your particular case. Otherwise it would fail for the first line of the input file since you're not setting FS until AFTER the first line is read and has been split into fields. You meant this:

awk 'BEGIN{FS = "\t"} {if ($2 < -7.5) print $0}' "$x"

which can be written more idiomatically as just:

awk -F'\t' '$2 < -7.5' "$x"

For the second case you're just not passing in the argument, as you already realised. All you need to do is:

awk -F'\t' -v max="$1" '$2 < max' "$x"

See http://cfajohnson.com/shell/cus-faq-2.html#Q24.

Upvotes: 0

John1024
John1024

Reputation: 113814

In your shell script, the first argument to the shellscript is available as $1. You can assign that value to an awk variable as follows:

find . -name 'test_score_*.txt' -type f -exec awk -v a="$1" -F'\t' '$2 < a' {} +

Discussion

  • Your print0/while read loop is very good. The -exec option offered by find, however, makes it possible to run the same command without any explicit looping.

  • The command {if ($2 < -7.5) print $0} can optionally be simplified to just the condition $2 < -7.5. This is because the default action for a condition is print $0.

  • Note that the references $1 and $2 are entirely unrelated to each other. Because $1 is in double-quotes, the shell substitutes in for it before the awk command starts to run. The shell interprets $1 to mean the first argument to the script. Because $2 appears in single quotes, the shell leaves it alone and it is interpreted by awk.  Awk interprets it to mean the second field of its current record.

Upvotes: 3

Related Questions