Vonton
Vonton

Reputation: 3382

Join two files together (AWK)

I have one problem. I would like to merge two files together. Where:

File 1:

 chr**1  10000**   rs200132 A  C  100.000
 chr**2  20000**   rs5000   C  G   80.000

File 2:

 rs200132  **1:10000**  A   800   200  Nmf 
 rs210111  **1:10000**  G   200   800  VFC 
 rs310000  **1:10000**  C   100   500  tff
 rs50001   **2:20000**  T   500   100  jpp
 rs60000   **2:20000**  A   1000   10  jkl

Output:

 chr**1  10000**  rs200132  A  A  C   800   200  Nmf
 chr**1  10000**  rs210111  G  A  C   200   800  VFC
 chr**1  10000**  rs310000  C  A  C   100   500  tff
 chr**2  20000**  rs50001   T  C  G   500   100  jpp
 chr**2  20000**  rs60000   A  C  G   1000   10  jkl

Than, from the first file are the marks number after "chr" and in the second column the number. The same marks are in the second file, but there is in second column like 1:10000. I would like to join this two file but for the first file will be much more joined rows (f.e.: for the first row from the first file will be three rows from the second file.) Thank you

Upvotes: 0

Views: 149

Answers (3)

Steve
Steve

Reputation: 54592

Here's another way using Perl:

perl -lane '
    if (@ARGV) {
        ($x = $F[0]) =~ s/[^\d]*//;
        $h{$x}{$F[1]} = [ @F[0,1,3,4] ]
    }
    else {
        @t = split(":", $F[1]);
        $r = $h{$t[0]}{$t[1]};
        print join(" ", @$r[0,1], @F[0,2], @$r[2,3], @F[3..5])
    }
' file1 file2 | column -t

Results:

chr1  10000  rs200132  A  A  C  800   200  Nmf
chr1  10000  rs210111  G  A  C  200   800  VFC
chr1  10000  rs310000  C  A  C  100   500  tff
chr2  20000  rs50001   T  C  G  500   100  jpp
chr2  20000  rs60000   A  C  G  1000  10   jkl

Upvotes: 0

mpapec
mpapec

Reputation: 50677

perl -lane'
  BEGIN{ $x=pop; %h = map{ $_->[1] => $_ } map [split], <>; @ARGV=$x }
  $F[1] =~ s/.+?://;
  $t = $h{$F[1]};
  print join " ", @$t[0,1], @F[0,2], @$t[3,4], @F[3..5];
' file1 file2

output

chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl

Upvotes: 4

sat
sat

Reputation: 14979

You can use this awk,

awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2

Test:

sat:~# awk 'NR==FNR{a[$2]=$1;b[$2]=$4" "$5;next} {sub(/.*:/,"",$2); $3=$1" "$3" "b[$2]; $1=a[$2];}1' file1 file2
chr**1 10000** rs200132 A A C 800 200 Nmf
chr**1 10000** rs210111 G A C 200 800 VFC
chr**1 10000** rs310000 C A C 100 500 tff
chr**2 20000** rs50001 T C G 500 100 jpp
chr**2 20000** rs60000 A C G 1000 10 jkl

Upvotes: 3

Related Questions