Reputation: 2356
I have a file that contains a list of numbers. I have a second file with various entries and several fields each.
What I want to do is to get all the lines whose 12th field is equal to the 1st number and place them in a new file, then to the second number, and so on.
I wrote a one-liner that makes sense, but I can't figure out why it won't work.
This is the list of numbers:
cat truncations_list.txt
3
318
407
412
7
The file with the entries to be sorted is:
M00970:45:000000000-A42FD:1:1101:14736:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC 79 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAA 65 GGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC 79S65M 1 81 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGG -2 318
M00970:45:000000000-A42FD:1:1101:15371:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAGTCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG 83 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAG 61 TCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG 83S61M 1 81 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGG 2 407
This is my command:
file="truncations_list.txt"
while read line; do awk '$12==$line' R2_Output.txt >reads_$line.txt ; done <"$file"
This command will create all the files "reads_412.txt", etc, but all the files are empty.
I appreciate your help!
Upvotes: 0
Views: 6318
Reputation: 2648
Minimizing references to $x field variables can improve Awk performance. It mostly matters for more complex scripts, but its worth trying out this slight optimization in case you are processing large files with millions of records:
awk 'FNR==NR {a[$1]; next} (f=$12) in a {print >f}' trunc.txt R2_Out.txt
Upvotes: 0
Reputation: 247042
Your main problem is that the awk program is in single quotes, so the "$line" variable is never expanded. The quick fix is
awk -v num=$line '$12==num' R2_Output.txt
But, don't do that. You're reading the output file once for each line in the numbers file. You can make it work by just reading through each file only one time:
awk '
# read the list of numbers in truncations_list
FNR == NR {
num[$1]
next
}
# process each line of the output file
# any lines with an "unknown" $12 will be ignored
$12 in num {
f = "reads_" $12 ".txt"
print >> f
}
' truncations_list.txt R2_Output.txt
Upvotes: 3