shb
shb

Reputation: 33

add strings from one file to another when they have a common column

I have two file types containing for example: file type1 (Hsrr610_mult_notab.ko):

K00002 2023649
K00002 2643896
K00006 1614154
K00006 600734
K00008 1562227
K00012 1353687

file type2 (Hsrr610.out (extracted from multiple ko_*.ko files)):

K00002 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 ko00564,ko01110,ko04011
K00008 ko00040,ko00051,ko01100
K00012 ko00040,ko00053,ko00520,ko01100

here's the script I wrote to check if there's a common KXXXXX in first and second file append koXXXXX strings (koXXXXX and comma should assume as a single string) to the first file like:

K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100

, but it doesn't work correctly:

#!/usr/bin/bash
for i in ko_*.ko
do
r="$(echo $i | sed s/ko_// | sed s/.ko// )";
echo $(echo "$r " && cat $i | sed ':a;N;$!ba;s/\n/,/g' ) > $r.csvt
done
cat *.csvt > Hsrr610.out && rm *.csvt
for j in $(cat Hsrr610.out)
do 
k="$(echo $j | grep "K[0-9]*" | sed s/\n/0/g | sed s/\t//g)" 
l="$(echo $j | grep "ko*")" 
echo $k
awk -v one="$k" -v two=" $j" '{if (/one/) {$0=$0  two}; print}' Hsrr610_mult_notab.ko > out
done

Thanks,

Upvotes: 3

Views: 52

Answers (2)

karakfa
karakfa

Reputation: 67467

if the keys are sorted, that's what join is for.

$ join file1 file2

K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133428

EDIT: As OP changed requirement so adding this solution now.

awk 'FNR==NR{a[$1]=$NF;next} {print $0,a[$1]}' Hsrr610.out Hsrr610_mult_notab.ko


Following awk may help you here.

awk '!b[$1]++{c[++count]=$1} {a[$1]=a[$1]?a[$1] OFS $NF:$NF} END{for(i=1;i<=count;i++){print c[i] FS a[c[i]]}}' OFS=","  Input_file

Adding a non-one liner form of solution too now.

awk '
!b[$1]++{
  c[++count]=$1
}
{
  a[$1]=a[$1]?a[$1] OFS $NF:$NF
}
END{
  for(i=1;i<=count;i++){
    print c[i] FS a[c[i]]}
}
' OFS=","  Input_file

Upvotes: 2

Related Questions