Reputation: 475
I would like to filter file1
based on the two criteria.
(a) Only include records where $1
can find a match with $1
in file2
(there will be multiple matches in many cases),
(b) When a match is found, it should check $2
in file1
to ensure that it falls within a range specified by $2
and $3
in file2
.
file1:
seq_100|rf001 298 01 11 01 11
seq_0442|rf76 6000 01 11 10 00
seq_9999|rf54 5098 01 01 01 01
file2:
seq_100|rf001 0 679
seq_100|rf001 700 800
seq_100|rf001 19000 22000
seq_100|rf001 23000 23500
seq_9999|rf54 800 3000
seq_9999|rf54 7000 7800
seq_9999|rf54 8000 9000
Expected output:
seq_100|rf001 298 01 11 01 11
Upvotes: 0
Views: 93
Reputation: 77085
Here is another way with awk
:
awk '
NR==FNR {
line[$1,$2] = $0;
next
}
{
for(key in line) {
split(key, tmp, SUBSEP);
if(tmp[1] == $1 && tmp[2] > $2 && tmp[2] < $3)
print line[tmp[1],tmp[2]]
}
}' file1 file2
Output:
seq_100|rf001 298 01 11 01 11
Explanation:
Upvotes: 2
Reputation: 14949
You can try this awk
one-liner,
awk 'NR==FNR{ if($1 in a) a[$1]=a[$1]","$2" "$3; else a[$1]=$2" "$3;next;} {n=split(a[$1],arr,",");for(i=1;i<n;i++){split(arr[i],b," ");if( $2 > b[1] && $2 < b[2] ){ print $0;} }}' file2 file1
OR
awk
script,
NR==FNR{
if($1 in a)
a[$1]=a[$1]","$2" "$3;
else
a[$1]=$2" "$3;
next;
}
{
n=split(a[$1],arr,",");
for(i=1;i<=n;i++){
split(arr[i],b," ");
if( $2 > b[1] && $2 < b[2] ){
print $0;
}
}
}
Test:
sat:~# awk -f sample.awk file2 file1
seq_100|rf001 298 01 11 01 11
Upvotes: 1