Reputation: 561

Check duplicates column and print in another in bash

Hi I need to do this in example bellow:

input file:

chr17   41246351    41246352    NM_007294_Exon_10
chr17   41246351    41246352    NM_007297_Exon_9
chr17   41246351    41246352    NM_007300_Exon_10
chr17   41246351    41246352    NR_027676_Exon_10
chr17   41246352    41246353    NM_007294_Exon_10
chr17   41246352    41246353    NM_007297_Exon_9
chr17   41246352    41246353    NM_007300_Exon_10

Get output like this:

chr17   41246351    41246352    NM_007294_Exon_10,NM_007297_Exon_9,NM_007300_Exon_10,NR_027676_Exon_10
chr17   41246352    41246353    NM_007294_Exon_10,NM_007297_Exon_9,NM_007300_Exon_10

I was try to use uniq and sort, but with no success. Thank you for any help.

Upvotes: 0

Answers (3)

Sundeep

Reputation: 23667

$ perl -ne '($k,$v)=/^(.*\s)(\S+)$/; $h{$k} .= "$v,";
            END{print "$_$h{$_}\n" foreach keys %h }' ip.txt
chr17   41246351    41246352    NM_007294_Exon_10,NM_007297_Exon_9,NM_007300_Exon_10,NR_027676_Exon_10,
chr17   41246352    41246353    NM_007294_Exon_10,NM_007297_Exon_9,NM_007300_Exon_10,

This leaves a trailing , though.. can be removed using sed 's/,$//'

Or use ?: conditional to add comma as required (similar to logic used by @sat in awk solution), doesn't need post processing to remove trailing ,

$ perl -ne '($k,$v)=/^(.*\s)(\S+)$/; $h{$k} .= $h{$k}?",$v":"$v";
            END{print "$_$h{$_}\n" foreach keys %h }' ip.txt

Upvotes: 1

sat

Reputation: 14949

You can use this awk:

awk '{i=$1 FS $2 FS $3} {a[i]=!a[i]?$4:a[i] FS $4} END {for (l in a) {print l,a[l]}}' file

If you want last column as comma separated,

awk '{i=$1 FS $2 FS $3} {a[i]=!a[i]?$4:a[i] "," $4} END {for (l in a) {print l,a[l]}}' file

Upvotes: 2

Syerad

Reputation: 113

try to use awk:

awk '!seen[$2]++' testfile

Hope this helps!

Upvotes: 0

Check duplicates column and print in another in bash

Answers (3)

Related Questions