justaguy
justaguy

Reputation: 3022

add plus or minus in awk if no match

I am trying to match all the lines in the below file to match. The awk will do that the problem is that the lines that do not match should be within plus or minus 10. I am not sure how to tell awk that the if a match is not found then use either plus or minus the coordinates in file. If no match is found after that then no match is in the file. Thank you :).

file

955763
957852
976270

bigfile

chr1    955543  955763  chr1:955543-955763  AGRN-6|gc=75
chr1    957571  957852  chr1:957571-957852  AGRN-7|gc=61.2
chr1    970621  970740  chr1:970621-970740  AGRN-8|gc=57.1

awk

awk 'NR==FNR{A[$1];next}$3 in A' file bigfile > output

desired output (same as bigfile)

chr1    955543  955763  chr1:955543-955763  AGRN-6|gc=75
chr1    957571  957852  chr1:957571-957852  AGRN-7|gc=61.2

Upvotes: 0

Views: 152

Answers (3)

Tom Fenech
Tom Fenech

Reputation: 74595

If there's no difference between a row that matches and one that's close, you could just set all of the keys in the range in the array:

awk 'NR == FNR { for (i = -10; i <= 10; ++i) A[$1+i]; next } 
$3 in A' file bigfile > output

The advantage of this approach is that only one lookup is performed per line of the big file.

Upvotes: 1

karakfa
karakfa

Reputation: 67467

Your data already produces the desired output (all exact match).

$ awk 'NR==FNR{a[$1];next} $3 in a{print; next} 
              {for(k in a) 
                 if((k-$3)^2<=10^2) {print $0, " --> within 10 margin"; next}}' file bigfile

chr1    955543  955763  chr1:955543-955763  AGRN-6|gc=75
chr1    957571  957852  chr1:957571-957852  AGRN-7|gc=61.2
chr1    976251  976261  chr1:976251-976261  AGRN-8|gc=57.1  --> within 10 margin

I added a fake 4th row to get the margin match

Upvotes: 1

anubhava
anubhava

Reputation: 785058

You need to run a loop on array a:

awk 'NR==FNR {
   a[$1]
   next
}
{
  for (i in a)
     if (i <= $3+10 && i >= $3-10)
        print
}' file bigfile > output

Upvotes: 1

Related Questions