biotech
biotech

Reputation: 727

Modify awk code

I would like to print a row of input file 11 if contains less than two strings found in NV_11.tab. Now is not catching strings in file 11 because is looking for exact match. Script needs a cleaning to catch them. I tried adding [^0-9] next to $i but seems this is not allowed.

Thanks, Bernardo

awk 'NR==FNR{a[$1]++; next}
       {
        c=0;
        for(i=2;i<=NF;i++){
            if($i in a){c++}
        } 

       } 
       c<=1;' NV_1.tab 11

#NV_1.tab
HS302
HS303
HS304
HS305
HS319
HS321
HS322
HS323
HS324
HS326
HS327
HS328
HS329
HS330
HS331
HS332
HPSD74

#11
HPNK_11595  HS302_01873 HS303_01073
HPNK_11596  HPNK_11596  HPS_02673   HS302_01873

#current output
HPNK_11595  HS302_01873 HS303_01073
HPNK_11596  HPNK_11596  HPS_02673   HS302_01873

#desired output
HPNK_11596  HPNK_11596  HPS_02673   HS302_01873

Upvotes: 0

Views: 25

Answers (1)

Etan Reisner
Etan Reisner

Reputation: 80931

The simplest way I see to do this is something like this.

Inside the for loop add

s=$i
gsub(/_.*$/, "", s)

and then replace ($i in a) with (s in a).

Upvotes: 1

Related Questions