user1987607
user1987607

Reputation: 2157

if duplicate values in one column than copy value from other column to a line above

I'm working with a table that looks like this

C1    C2    C3
1     a     b
2     c     d
4     e     g
4     f     h
5     x     y
...   ...   ...

If the values in C1 are the same (in this example there is two times a 4) than I want the values of C2 and C3 to be pasted on the first line with 4 in C1 and I want to remove then the second line with 4 in C1. So at the end it should look like this

C1    C2    C3
1     a     b
2     c     d
4     e,f   g,h
5     x     y

I'm working with a perl script. I'm using while to loop through the file. I've used thing like my %seen or count in other scripts, but I'm not able to figure out how to use them know. It looks really simple to do ...

This is how my while loop looks like for the moment

 while (<$DATA>) {
    @columns = split
    $var1 = $columns[0]
    $var2 = $columns[1]
    $var3 = $columns[2];         
     }  

Upvotes: 0

Views: 247

Answers (1)

Miguel Prz
Miguel Prz

Reputation: 13792

Use a hash to control the duplicates. I have used in my example a hash (%info) of hashes, with keys C1 and C2. Each of them contains an array reference to add the duplicated items.

use strict;
use warnings;

my %info = ();
while (<DATA>) {
    my @columns = split /\s+/;
    if( exists $info{ $columns[0] } ) {
        push @{ $info{ $columns[0] }->{C2} }, $columns[1];
        push @{ $info{ $columns[0] }->{C3} }, $columns[2];
    }
    else {
        $info{ $columns[0] } = { C2 =>[ $columns[1] ], C3 => [ $columns[2]] }
    }        
}  

foreach my $c1(sort {$a<=>$b} keys %info ) {
    print $c1, "\t", 
          join(',',@{$info{$c1}->{C2}}), "\t", 
          join(',',@{$info{$c1}->{C3}}), "\n";
} 


__DATA__
1     a     b
2     c     d
4     e     g
4     f     h
5     x     y

Upvotes: 2

Related Questions