Reputation: 85
I need help to find match by fulling the conditions from file2 to file1 and print the results from file1.
Conditions:
KEY: It could be any of the motifs that have +/- "one" ONLY. Meaning, it can only be an overall loss or gain of 1 regardless of what repeat it comes from.
file1:
A [TAGA] 13 [CAGA] 4 TAGA 18 9015 0.13662
A [TAGA] 11 [CAGA] 4 TAGA 16 9006 0.136483
A [TAGA] 11 [CAGA] 3 TAGA 15 7000 0.106083
A [TAGA] 9 [CAGA] 3 TAGA 13 6177 0.0936108
A [TAGA] 12 [CAGA] 5 TAGA 18 5377 0.081487
A [TAGA] 12 [CAGA] 3 TAGA 16 4663 0.0706665
A [TAGA] 10 [CAGA] 4 TAGA 15 3351 0.0507835
A [TAGA] 14 [CAGA] 3 TAGA 18 1079 0.016352
A [TAGA] 8 [CAGA] 4 TAGA 13 317 0.00480405
A [TAGA] 11 [CAGA] 6 TAGA 18 235 0.00356136
file2:
A [TAGA] 10 [CAGA] 3 TAGA
A [TAGA] 12 [CAGA] 4 TAGA
B [AGAT] 10 [AGAC] 6
B [AGAT] 11 [AGAC] 5
desired output:
A [TAGA] 13 [CAGA] 4 TAGA 18 9015 0.13662
A [TAGA] 11 [CAGA] 4 TAGA 16 9006 0.136483
A [TAGA] 11 [CAGA] 3 TAGA 15 7000 0.106083
A [TAGA] 9 [CAGA] 3 TAGA 13 6177 0.0936108
A [TAGA] 12 [CAGA] 5 TAGA 18 5377 0.081487
A [TAGA] 12 [CAGA] 3 TAGA 16 4663 0.0706665
A [TAGA] 10 [CAGA] 4 TAGA 15 3351 0.0507835
Tried so far:
awk 'NR==FNR{a[$1,$2,$3]++;next}a[$1,$2,$3+1] || a[$1,$2,$3-1]' file2 file1
vWA [TAGA] 13 [CAGA] 4 TAGA 18 9015 0.13662
vWA [TAGA] 11 [CAGA] 4 TAGA 16 9006 0.136483
vWA [TAGA] 11 [CAGA] 3 TAGA 15 7000 0.106083
vWA [TAGA] 9 [CAGA] 3 TAGA 13 6177 0.0936108
vWA [TAGA] 11 [CAGA] 6 TAGA 18 235 0.00356136 (wrong by the conditions, [CAGA]6; has +2 gain)
missing some true results
A [TAGA] 12 [CAGA] 5 TAGA 18 5377 0.081487
A [TAGA] 12 [CAGA] 3 TAGA 16 4663 0.0706665
A [TAGA] 10 [CAGA] 4 TAGA 15 3351 0.0507835
Here i am matching only first three columns but i needed to extend 4 and 5 columns too (awk 'NR==FNR{a[$1,$4,$5]++;next}a[$1,$4,$5+1] || a[$1,$4,$5-1]')
.
But not sure how to satisfy all conditions and gets the desired outputs.
Please help! Thanks
Upvotes: 4
Views: 133
Reputation: 424
Below awk code satisfies BOTH condition.
$ cat tagaawk.sh
awk 'NR==FNR{seen[$1$2$4]++;
m=seen[$1$2$4]
x=col3_min[$1$2$4]
y=col3_max[$1$2$4]
z=col5_min[$1$2$4]
t=col5_max[$1$2$4]
col3_min[$1$2$4]=(m==1||$3<x)?$3:x
col3_max[$1$2$4]=($3>y)?$3:y
col5_min[$1$2$4]=(m==1||$5<z)?$5:z
col5_max[$1$2$4]=($5>t)?$5:t;
next}
{
m=seen[$1$2$4]
x=col3_min[$1$2$4]
y=col3_max[$1$2$4]
z=col5_min[$1$2$4]
t=col5_max[$1$2$4]
for (i=1;i<=length(seen);i++)
if(m==i && $3>=x-1 && $3<=y+1 && $5>=z-1 && $5<=t+1)
print $0}' file2 file1
OUTPUT
$ sh tagaawk.sh
A [TAGA] 13 [CAGA] 4 TAGA 18 9015 0.13662
A [TAGA] 11 [CAGA] 4 TAGA 16 9006 0.136483
A [TAGA] 11 [CAGA] 3 TAGA 15 7000 0.106083
A [TAGA] 9 [CAGA] 3 TAGA 13 6177 0.0936108
A [TAGA] 12 [CAGA] 5 TAGA 18 5377 0.081487
A [TAGA] 12 [CAGA] 3 TAGA 16 4663 0.0706665
A [TAGA] 10 [CAGA] 4 TAGA 15 3351 0.0507835
Upvotes: 1