filtering file dependent on a value falling within a range specified in another file

Question

I would like to filter file1 based on the two criteria.

(a) Only include records where $1 can find a match with $1 in file2 (there will be multiple matches in many cases),

(b) When a match is found, it should check $2 in file1 to ensure that it falls within a range specified by $2 and $3 in file2.

file1:

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

file2:

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

Expected output:

seq_100|rf001 298 01 11 01 11

jaypal singh · Accepted Answer

Here is another way with awk:

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

Output:

seq_100|rf001 298 01 11 01 11

Explanation:

We iterate through file1 and store the entire line in two dimensional array indexed at column1 and column2.
Once entire file1 is stored in memory, we iterate over each key in array line.
We split the key and check if column1 of second file is equal to the first part of the key and the second part of the key is within the range.
If everything is golden, we print the line.

filtering file dependent on a value falling within a range specified in another file

Answers (2)

Related Questions