Reputation: 705
I have two files. The first file looks like this:
1 174392
1 230402
2 4933400
3 39322
4 42390021
5 80022392
6 3818110
and so on
the second file looks like this:
chr1 23987 137011
chr1 220320 439292
chr2 220320 439292
chr2 2389328 3293292
chr3 392329 398191
chr4 421212 3292393
and so on.
I want to return the whole line, provided that the first column in FILE1 = the first line in FILE2, as a string match AND the 2nd column in file 2 is greater than column 2 in FILE2 but less than column 3 in FILE2.
So in the above example, the line
1 230402
in FILE1 and
chr1 220320 439292
in FILE2 would satisfy the conditions because 230402 is between 220320 and 439292 and 1 would be equal to chr1 after I make the strings match, therefore that line in FILE2 would be printed.
The code I wrote was this:
#!/bin/bash
$F1="FILE1.txt"
read COL1 COL2
do
grep -w "chr$COL1" FILE2.tsv \
| awk -v C2=$COL2 '{if (C2>$1 && C2<$2); print $0}'
done < "$F1"
I have tried many variations of this. I do not care if the code is entirely in awk, entirely in bash, or a mixture.
Can anyone help?
Thank you!
Upvotes: 0
Views: 183
Reputation: 705
Thanks very much!
These answers work and are very helpful.
Also at long last I realized I should have had:
awk -v C2=$COL2 'if (C2>$1 && C2<$2); {print $0}'
with the brace in a different place and I would have been fine.
At any rate, thank you very much!
Upvotes: 0
Reputation: 782285
awk 'BEGIN {i = 0}
FNR == NR { chr[i] = "chr" $1; test[i++] = $2 }
FNR < NR { for (c in chr) {
if ($1 == chr[c] && test[c] > $2 && test[c] < $3) { print }
}
}' FILE1.txt FILE2.tsv
FNR
is the line number within the current file, NR
is the line number within all the input. So the first block processes the first file, collecting all the lines into arrays. The second block processes any remaining files, searching through the array of chrN
values looking for a match, and comparing the other two numbers to the number from the first file.
Upvotes: 1
Reputation: 77175
Here is one way using awk
:
awk '
NR==FNR {
$1 = "chr" $1
seq[$1,$2]++;
next
}
{
for(key in seq) {
split(key, tmp, SUBSEP);
if(tmp[1] == $1 && $2 <= tmp[2] && tmp[2] <= $3 ) {
print $0
}
}
}' file1 file2
chr1 220320 439292
Upvotes: 1