Reputation: 61
I have two files.
File 1 (shortened):
1 1:135982[b38]A,G 0 12500 A G
1 1:611317[b38]A,G 0 611317 A G
1 1:722408[b38]G,C 0 722408 G C
1 1:726649[b38]G,A 0 726649 G A
File 2 (shortened):
ENSG00000223972 ENST00000456328 DDX11L1 1 11869 14409
ENSG00000223972 ENST00000450305 DDX11L1 1 11869 14409
ENSG00000227232 ENST00000488147 WASH7P 1 14404 29570
I want to create a new file where:
Print column 3 of file 2 and column 2 of file 1.
Example:
For the first line of file 1, column 1 (1) matches column 4 of file 2 (also 1) AND column 4 of file 1 (12500) is >= 11869 and <= 14409, so the desired output would be:
DDX11L1 1:134982[b38]A,G
No other matches exist in this example so that would be the sole output.
How can I do this with awk?
Upvotes: 0
Views: 88
Reputation: 16662
Not awk but trivial with sqlite:
#!/bin/bash
# convert spaces to tabs
tr -s ' \t' '\t' <file1.raw >file1.tsv
tr -s ' \t' '\t' <file2.raw >file2.tsv
sqlite3 >output.tsv <<'EOD'
create table t1 (c1,c2,c3,c4 int,c5,c6);
create table t2 (c1,c2,c3,c4,c5 int,c6 int);
.mode tabs
.import file1.tsv t1
.import file2.tsv t2
select distinct t2.c3, t1.c2 from t1,t2
where t1.c1=t2.c4 and t1.c4 between t2.c5 and t2.c6;
EOD
Upvotes: 2