Mike Brown
Mike Brown

Reputation: 341

How to print the lines that contains certain strings by order?

I have two files

file indv

COPDGene_P51515
COPDGene_V67803
COPDGene_Z75868
COPDGene_U48329
COPDGene_R08908
COPDGene_E34944

file data

    COPDGene_Z75868  1   
    COPDGene_A12318  3
    COPDGene_R08908  5
    COPDGene_P51515  8
    COPDGene_U48329  2
    COPDGene_V67803  8
    COPDGene_E34944  2
    COPDGene_D29835  9

I want to print the lines that contains the strings in the indv by the order of indv like following

COPDGene_P51515  8
COPDGene_V67803  8
COPDGene_Z75868  1
COPDGene_U48329  2
COPDGene_R08908  5
COPDGene_E34944  2

I tried to use

awk 'NR==FNR{a[$1]++;next} ($1 in a)' indv data

But I got

        COPDGene_Z75868  1   
        COPDGene_R08908  5
        COPDGene_P51515  8
        COPDGene_U48329  2
        COPDGene_V67803  8
        COPDGene_E34944  2

which is not the order of indv.

Upvotes: 3

Views: 75

Answers (2)

Aia
Aia

Reputation: 31

awk 'FNR==NR{a[$1]=$2;next} a[$1]{print $1,a[$1]}' data indv
COPDGene_P51515 8
COPDGene_V67803 8
COPDGene_Z75868 1
COPDGene_U48329 2
COPDGene_R08908 5
COPDGene_E34944 2

Advantages: Only the second field is stored in memory, instead of the full record from data. It does not try to print a record from indv that does not have a match in data.

Disadvantages: It will keep only the last entry from data, if the lines were not unique.

Upvotes: 3

John1024
John1024

Reputation: 113924

$ awk 'FNR==NR{a[$1]=$0;next;} {print a[$1]}' data indv
COPDGene_P51515  8
COPDGene_V67803  8
COPDGene_Z75868  1
COPDGene_U48329  2
COPDGene_R08908  5
COPDGene_E34944  2

How it works

  • FNR==NR{a[$1]=$0;next;}

    For the first file read, data, save each line in associative array a under the index of its first field, $1. Skip the rest of the commands and start over on the next line.

  • print a[$1]

    If we get here, we are working on the second file, indv. For this file, print each line from data that corresponds to the first field on this line. In this way, the contents of each line is controlled by data but the order of printing is controlled by indv.

Upvotes: 4

Related Questions