Reputation: 33
I have two file types containing for example: file type1 (Hsrr610_mult_notab.ko):
K00002 2023649
K00002 2643896
K00006 1614154
K00006 600734
K00008 1562227
K00012 1353687
file type2 (Hsrr610.out (extracted from multiple ko_*.ko files)):
K00002 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 ko00564,ko01110,ko04011
K00008 ko00040,ko00051,ko01100
K00012 ko00040,ko00053,ko00520,ko01100
here's the script I wrote to check if there's a common KXXXXX in first and second file append koXXXXX strings (koXXXXX and comma should assume as a single string) to the first file like:
K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100
, but it doesn't work correctly:
#!/usr/bin/bash
for i in ko_*.ko
do
r="$(echo $i | sed s/ko_// | sed s/.ko// )";
echo $(echo "$r " && cat $i | sed ':a;N;$!ba;s/\n/,/g' ) > $r.csvt
done
cat *.csvt > Hsrr610.out && rm *.csvt
for j in $(cat Hsrr610.out)
do
k="$(echo $j | grep "K[0-9]*" | sed s/\n/0/g | sed s/\t//g)"
l="$(echo $j | grep "ko*")"
echo $k
awk -v one="$k" -v two=" $j" '{if (/one/) {$0=$0 two}; print}' Hsrr610_mult_notab.ko > out
done
Thanks,
Upvotes: 3
Views: 52
Reputation: 67467
if the keys are sorted, that's what join
is for.
$ join file1 file2
K00002 2023649 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00002 2643896 ko00010,ko00040,ko00561,ko00930,ko01100,ko01110,ko01120,ko01130,ko01220
K00006 1614154 ko00564,ko01110,ko04011
K00006 600734 ko00564,ko01110,ko04011
K00008 1562227 ko00040,ko00051,ko01100
K00012 1353687 ko00040,ko00053,ko00520,ko01100
Upvotes: 2
Reputation: 133428
EDIT: As OP changed requirement so adding this solution now.
awk 'FNR==NR{a[$1]=$NF;next} {print $0,a[$1]}' Hsrr610.out Hsrr610_mult_notab.ko
Following awk
may help you here.
awk '!b[$1]++{c[++count]=$1} {a[$1]=a[$1]?a[$1] OFS $NF:$NF} END{for(i=1;i<=count;i++){print c[i] FS a[c[i]]}}' OFS="," Input_file
Adding a non-one liner form of solution too now.
awk '
!b[$1]++{
c[++count]=$1
}
{
a[$1]=a[$1]?a[$1] OFS $NF:$NF
}
END{
for(i=1;i<=count;i++){
print c[i] FS a[c[i]]}
}
' OFS="," Input_file
Upvotes: 2