Reputation: 99
I have 2 text files. the 1st one is like this:
DB 41533499 41533500 14
CD 41533500 41533501 3
AR 41533504 41533505 5
DR 41533506 41533507 3
AR 41533508 41533509 1
AR 48743349 48743350 1
and the 2nd one looks like this:
DB 41533400 41533600
DR 41533300 41533800
AR 41533200 41533800
AR 48743100 48743983
the difference between 2nd and 3rd columns is 1 which means that is a point. I would like to make a new file in which the 1st column is common between 2 files and also the range of 2nd and 3rd columns in file 2 is in the range of 2nd and 3rd columns in file2. here is the expected output:
DB 41533400 41533600 41533499 41533500 14
DR 41533300 41533800 41533506 41533507 3
AR 41533200 41533800 41533508 41533509 1
AR 48743100 48743983 48743349 48743350 1
I am trying to do in linux command line and wrote the following but did not get what I want:
awk '{print $1 "\t" $2 "\t" $3 "\t" }' file2.txt '{print $1 "\t" $2 "\t" $3 "\t" $4 }' file1.txt > output.txt
do you know how to fix it?
Upvotes: 0
Views: 67
Reputation: 67467
based on my free interpretation of the requirements based on the missing row
with pipes instead of a single awk
script (already answered)
$ join <(sort file2) <(sort file1) | # sort and join on key (1st field)
awk '$2<$4 && $3>$5' | # apply within range logic
sort -k6n | # sort ascending based on last field
awk '!a[$2]++' | # pick first instance of 2nd field (the lowest)
tac # reverse to be in descending order
DB 41533400 41533600 41533499 41533500 14
DR 41533300 41533800 41533506 41533507 3
AR 48743100 48743983 48743349 48743350 1
AR 41533200 41533800 41533508 41533509 1
Upvotes: 0
Reputation: 37394
Here's one for GNU awk but I share the same question with @RomanPerekhrest about the record AR 41533504 41533505 5
:
$ awk 'NR==FNR{
a[$1][$2]=$3; next
}
($1 in a) {
for(i in a[$1])
if($2>=i && $3 <= a[$1][i])
print $1,i,a[$1][i],$2,$3,$4
}' file2 file1
DB 41533400 41533600 41533499 41533500 14
AR 41533200 41533800 41533504 41533505 5
DR 41533300 41533800 41533506 41533507 3
AR 41533200 41533800 41533508 41533509 1
AR 48743100 48743983 48743349 48743350 1
Upvotes: 1