Reputation: 1388
I have two files with lines like this:
File A:
TCONS_00000007 ENSMUST00000044158 gene:ENSMUSG00000041560 433/463 0.0 364.0
TCONS_00000009 ENSMUST00000044158 gene:ENSMUSG00000041560 1051/1122 0.0 890.0
TCONS_00000212 ENSMUST00000112323 gene:ENSMUSG00000032582 458/475 0.0 420.0
TCONS_00000636 ENSMUST00000061242 gene:ENSMUSG00000048076 1694/1751 0.0 1571.0
TCONS_00000636 ENSMUST00000163300 gene:ENSMUSG00000048076 1658/1713 0.0 1539.0
File B:
chr1 4675000 4675009 TCONS_00000007
chr1 4677953 4678274 TCONS_00000008
chr1 4677956 4679079 TCONS_00000009
chr1 43944821 43946606 TCONS_00000636
EDIT: Column 4 in File B would be unique. Column 1 in File A wouldn't necessarily be though.
What I'd like to do is out put a file such that it'd only keep lines where there is a match on column 1 on A and column 4 on B. Duplicates are allowed. (So in the example above I'd want the output to look like this):
chr1 4675000 4675009 TCONS_00000007 ENSMUST00000044158 gene:ENSMUSG00000041560
chr1 43944821 43946606 TCONS_00000636 ENSMUST00000061242 gene:ENSMUSG00000048076
chr1 43944821 43946606 TCONS_00000636 ENSMUST00000163300 gene:ENSMUSG00000048076
So I tried using awk to do this... and I'm stuck.
FNR==NR{ ### script.awk
array[$4]++
next
}
{
if ($1 in array){
print $1,$2,$3...
}
}
awk -f script.awk fileB fileA > fileC
What I'm having trouble getting is the printing part to work right. As you can see, doing this would keep lines from fileA that I want, but I can't think of a way to get the $1, $2, $3 columns of fileB that I also want in there (obviously typing in $1, $2, $3 won't work). What can I do?
Upvotes: 0
Views: 166
Reputation: 195039
from your current script, it looks like $4
is unique in fileB
. so you could try this modified script (based on your codes):
FNR==NR{ ### script.awk
array[$4]=$0
next
}
{
if ($1 in array){
print array[$1],$1,$2,$3...
}
}
then
awk -f script.awk fileB fileA > fileC
Upvotes: 1
Reputation: 185015
Try this :
awk '
NR==FNR{v=$1;$1="";arr[v]=$0}
NR!=FNR{v=$4;$4="";arr[v]=arr[v] $0}
END{for (a in arr) print a, arr[a]}
' A B
Upvotes: 0