Carmen Sandoval
Carmen Sandoval

Reputation: 2356

While read line, awk $line

I have a file that contains a list of numbers. I have a second file with various entries and several fields each.

What I want to do is to get all the lines whose 12th field is equal to the 1st number and place them in a new file, then to the second number, and so on.

I wrote a one-liner that makes sense, but I can't figure out why it won't work.

This is the list of numbers:

cat truncations_list.txt

3
318
407
412
7

The file with the entries to be sorted is:

M00970:45:000000000-A42FD:1:1101:14736:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC    79  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAA 65  GGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC   79S65M  1   81  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGG   -2  318
M00970:45:000000000-A42FD:1:1101:15371:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAGTCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG    83  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAG 61  TCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG   83S61M  1   81  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGG   2   407

This is my command:

file="truncations_list.txt"
while read line; do awk '$12==$line' R2_Output.txt >reads_$line.txt ; done <"$file"

This command will create all the files "reads_412.txt", etc, but all the files are empty.

I appreciate your help!

Upvotes: 0

Views: 6318

Answers (2)

Amit Naidu
Amit Naidu

Reputation: 2648

Minimizing references to $x field variables can improve Awk performance. It mostly matters for more complex scripts, but its worth trying out this slight optimization in case you are processing large files with millions of records:

 awk 'FNR==NR {a[$1]; next} (f=$12) in a {print >f}' trunc.txt R2_Out.txt

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247042

Your main problem is that the awk program is in single quotes, so the "$line" variable is never expanded. The quick fix is

awk -v num=$line '$12==num' R2_Output.txt

But, don't do that. You're reading the output file once for each line in the numbers file. You can make it work by just reading through each file only one time:

awk '
    # read the list of numbers in truncations_list
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the output file
    # any lines with an "unknown" $12 will be ignored
    $12 in num {
        f = "reads_" $12 ".txt"
        print >> f
    }
' truncations_list.txt R2_Output.txt

Upvotes: 3

Related Questions