Reputation: 341
I have two files
file indv
COPDGene_P51515
COPDGene_V67803
COPDGene_Z75868
COPDGene_U48329
COPDGene_R08908
COPDGene_E34944
file data
COPDGene_Z75868 1
COPDGene_A12318 3
COPDGene_R08908 5
COPDGene_P51515 8
COPDGene_U48329 2
COPDGene_V67803 8
COPDGene_E34944 2
COPDGene_D29835 9
I want to print the lines that contains the strings in the indv
by the order of indv
like following
COPDGene_P51515 8
COPDGene_V67803 8
COPDGene_Z75868 1
COPDGene_U48329 2
COPDGene_R08908 5
COPDGene_E34944 2
I tried to use
awk 'NR==FNR{a[$1]++;next} ($1 in a)' indv data
But I got
COPDGene_Z75868 1
COPDGene_R08908 5
COPDGene_P51515 8
COPDGene_U48329 2
COPDGene_V67803 8
COPDGene_E34944 2
which is not the order of indv
.
Upvotes: 3
Views: 75
Reputation: 31
awk 'FNR==NR{a[$1]=$2;next} a[$1]{print $1,a[$1]}' data indv
COPDGene_P51515 8
COPDGene_V67803 8
COPDGene_Z75868 1
COPDGene_U48329 2
COPDGene_R08908 5
COPDGene_E34944 2
Advantages: Only the second field is stored in memory, instead of the full record from data. It does not try to print a record from indv that does not have a match in data.
Disadvantages: It will keep only the last entry from data, if the lines were not unique.
Upvotes: 3
Reputation: 113924
$ awk 'FNR==NR{a[$1]=$0;next;} {print a[$1]}' data indv
COPDGene_P51515 8
COPDGene_V67803 8
COPDGene_Z75868 1
COPDGene_U48329 2
COPDGene_R08908 5
COPDGene_E34944 2
FNR==NR{a[$1]=$0;next;}
For the first file read, data, save each line in associative array a
under the index of its first field, $1
. Skip the rest of the commands and start over on the next
line.
print a[$1]
If we get here, we are working on the second file, indv. For this file, print each line from data that corresponds to the first field on this line. In this way, the contents of each line is controlled by data but the order of printing is controlled by indv.
Upvotes: 4