justaguy
justaguy

Reputation: 3022

remove lines that do not match specific digits in list file using awk

I am trying to use awk to remove the lines in file that do not match the digits after the NM_ but before the . in $2 of list. Thank you :).

file

204 NM_003852   chr7    +   138145078   138270332   138145293   
204 NM_015905   chr7    +   138145078   138270332   138145293   

list

TRIM24 NM_015905.2

awk

awk -v OFS="\t" '{ sub(/\r/, "") } ; NR==FNR { N=$2 ; sub(/\..*/, "", $2); A[$2]=N; next } ; $2 in A { $2=A[$2] } 1' list file > out

current output

204 NM_003852   chr7    +   138145078   138270332   138145293   
204 NM_015905.2 chr7    +   138145078   138270332   138145293   

desired output (line 1 removed as that is the line that does not match)

204 NM_015905.2 chr7    +   138145078   138270332   138145293

Upvotes: 1

Views: 71

Answers (2)

Ed Morton
Ed Morton

Reputation: 203995

$ awk -F'[ .]' 'NR==FNR{a[$2];next}$2 in a' list file
204 NM_015905   chr7    +   138145078   138270332   138145293

Upvotes: 1

karakfa
karakfa

Reputation: 67507

awk 'NR==FNR{split($2,f2,".");a[f2[1]];next} $2 in a' list file

Upvotes: 2

Related Questions