user1308144
user1308144

Reputation: 475

filtering file dependent on a value falling within a range specified in another file

I would like to filter file1 based on the two criteria.

(a) Only include records where $1 can find a match with $1 in file2 (there will be multiple matches in many cases),

(b) When a match is found, it should check $2 in file1 to ensure that it falls within a range specified by $2 and $3 in file2.

file1:

seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01

file2:

seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000

Expected output:

seq_100|rf001 298 01 11 01 11

Upvotes: 0

Views: 93

Answers (2)

jaypal singh
jaypal singh

Reputation: 77085

Here is another way with awk:

awk '
NR==FNR {
  line[$1,$2] = $0; 
  next
}
{
  for(key in line) {
    split(key, tmp, SUBSEP); 
    if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3) 
      print line[tmp[1],tmp[2]]
    }
}' file1 file2

Output:

seq_100|rf001 298 01 11 01 11

Explanation:

  • We iterate through file1 and store the entire line in two dimensional array indexed at column1 and column2.
  • Once entire file1 is stored in memory, we iterate over each key in array line.
  • We split the key and check if column1 of second file is equal to the first part of the key and the second part of the key is within the range.
  • If everything is golden, we print the line.

Upvotes: 2

sat
sat

Reputation: 14949

You can try this awk one-liner,

awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1

OR

awk script,

NR==FNR{
        if($1 in a)
                a[$1]=a[$1]","$2" "$3;
        else
                a[$1]=$2" "$3;
        next;
}
{

        n=split(a[$1],arr,",");
        for(i=1;i<=n;i++){
                split(arr[i],b," ");
                if( $2 > b[1] && $2 < b[2] ){
                        print $0;
                }
        }
}

Test:

sat:~# awk -f sample.awk  file2 file1 
seq_100|rf001 298 01 11 01 11

Upvotes: 1

Related Questions