chas
chas

Reputation: 1645

How to add multiple columns to a file from another file

I have two files like shown below which are tab-delimited:

file A

chr1   123 aa b c d
chr1   234 a  b c d
chr1   345 aa b c d
chr1   456 a  b c d
....

file B

chr1   123    aa    c    d    e   ff
chr1   345    aa    e    f    g   gg
chr1   123    aa    c    d    e   hh
chr1   567    aa    z    c    a   ii
chr1   345    bb    x    q    r   kk
chr1   789    df    f    g    s   ff
chr1   345    sh    d    t    g   ll

...

I want to add a new column to file A from file B based on 2 key columns "chr1", "123" i.e.(first two columns are key columns). If the key columns matches in both files, the data in column 7 in file B should be added to column 3 in file A.

For example (chr1 123) key is found twice in file B, therefore 3rd column in file A has ff and hh separated by comma. If the key is not found it should put NA and output should look like as shown below: output:

chr1   123  ff,hh       aa    b    c    d   
chr1   234    NA        a     b    c    d
chr1   345  gg,kk,ll    aa   b    c    d
chr1   456    NA        a    b    c     d

I achieved this using the awk solution

awk -F'\t' -v OFS='\t' 'NR==FNR{a[$1FS$2]=a[$1FS$2]?a[$1FS$2]","$7:$7;next}{$3=(($1FS$2 in a)?a[$1FS$2]:"NA")FS $3}1' fileB fileA

Now, i would like to add another column 6 along with column 7. Could anyone suggest how to do this? The output looks like:

chr1   123  ff,hh    e,e     aa    b    c    d   
chr1   234    NA     NA      a     b    c    d
chr1   345  gg,kk,ll g,r,g   aa   b    c    d
chr1   456    NA      NA     a    b    c     d

Thanks

Upvotes: 1

Views: 286

Answers (1)

n0741337
n0741337

Reputation: 2504

My suggestion is to use another array to track the next variable you want to add, but to keep the code a little more readable, I've made an executable awk script to generalize it a bit:

#!/usr/bin/awk -f

BEGIN { FS="\t"; OFS="\t" }

{ key = $1 FS $2 }

FNR==NR {
    updateArray( a, $7 )
    updateArray( b, $6 )
    next
}

{ $3 = concat( a, concat( b, $3 ) ) }

1

function updateArray( arr, fld ) {
    arr[key] = arr[key]!="" ? arr[key] "," fld : fld
}

function concat( arr, suffix ) {
  return( (arr[key]=="" ? "NA" : arr[key]) OFS suffix )
}

Here's the breakdown:

  • Set the FS and OFS values
  • Make a global key for every line read
  • Store data from the first file in arrays a and b where they are passed by reference to the function updateArray and the field value is passed by value
  • Update $3 using the local concat function
  • Print the updated line out with 1

As another option, you could make the value stored in a single a[key] equal to all the file B fields you want represented in $3 and have them separated by OFS. That would require parsing and reassembling the value in a[key] every time it changed as file B is parsed, but would make creating the $3 a simple three part concatenation.

Upvotes: 0

Related Questions