Joe
Joe

Reputation: 1388

Scripting: How to put together lines from files based on matching column?

I have two files with lines like this:

File A:

TCONS_00000007  ENSMUST00000044158  gene:ENSMUSG00000041560 433/463 0.0 364.0
TCONS_00000009  ENSMUST00000044158  gene:ENSMUSG00000041560 1051/1122   0.0 890.0
TCONS_00000212  ENSMUST00000112323  gene:ENSMUSG00000032582 458/475 0.0 420.0
TCONS_00000636  ENSMUST00000061242  gene:ENSMUSG00000048076 1694/1751   0.0 1571.0
TCONS_00000636  ENSMUST00000163300  gene:ENSMUSG00000048076 1658/1713   0.0 1539.0

File B:

chr1    4675000 4675009 TCONS_00000007
chr1    4677953 4678274 TCONS_00000008
chr1    4677956 4679079 TCONS_00000009
chr1    43944821    43946606    TCONS_00000636

EDIT: Column 4 in File B would be unique. Column 1 in File A wouldn't necessarily be though.

What I'd like to do is out put a file such that it'd only keep lines where there is a match on column 1 on A and column 4 on B. Duplicates are allowed. (So in the example above I'd want the output to look like this):

chr1    4675000 4675009 TCONS_00000007  ENSMUST00000044158  gene:ENSMUSG00000041560 
chr1    43944821    43946606     TCONS_00000636 ENSMUST00000061242  gene:ENSMUSG00000048076
chr1    43944821    43946606     TCONS_00000636 ENSMUST00000163300  gene:ENSMUSG00000048076

So I tried using awk to do this... and I'm stuck.

FNR==NR{ ### script.awk
    array[$4]++
    next
}

{
    if ($1 in array){
        print $1,$2,$3...
    }
}
awk -f script.awk fileB fileA > fileC

What I'm having trouble getting is the printing part to work right. As you can see, doing this would keep lines from fileA that I want, but I can't think of a way to get the $1, $2, $3 columns of fileB that I also want in there (obviously typing in $1, $2, $3 won't work). What can I do?

Upvotes: 0

Views: 166

Answers (2)

Kent
Kent

Reputation: 195039

from your current script, it looks like $4 is unique in fileB. so you could try this modified script (based on your codes):

FNR==NR{ ### script.awk
    array[$4]=$0
    next
}

{
    if ($1 in array){
        print array[$1],$1,$2,$3...
    }
}

then

awk -f script.awk fileB fileA > fileC

Upvotes: 1

Gilles Quénot
Gilles Quénot

Reputation: 185015

Try this :

awk '
    NR==FNR{v=$1;$1="";arr[v]=$0}
    NR!=FNR{v=$4;$4="";arr[v]=arr[v] $0}
    END{for (a in arr) print a, arr[a]}
' A B

Upvotes: 0

Related Questions