Reputation: 2157
I'm working with a table that looks like this
C1 C2 C3
1 a b
2 c d
4 e g
4 f h
5 x y
... ... ...
If the values in C1 are the same (in this example there is two times a 4) than I want the values of C2 and C3 to be pasted on the first line with 4 in C1 and I want to remove then the second line with 4 in C1. So at the end it should look like this
C1 C2 C3
1 a b
2 c d
4 e,f g,h
5 x y
I'm working with a perl script. I'm using while to loop through the file. I've used thing like my %seen or count in other scripts, but I'm not able to figure out how to use them know. It looks really simple to do ...
This is how my while loop looks like for the moment
while (<$DATA>) {
@columns = split
$var1 = $columns[0]
$var2 = $columns[1]
$var3 = $columns[2];
}
Upvotes: 0
Views: 247
Reputation: 13792
Use a hash to control the duplicates. I have used in my example a hash (%info
) of hashes, with keys C1 and C2. Each of them contains an array reference to add the duplicated items.
use strict;
use warnings;
my %info = ();
while (<DATA>) {
my @columns = split /\s+/;
if( exists $info{ $columns[0] } ) {
push @{ $info{ $columns[0] }->{C2} }, $columns[1];
push @{ $info{ $columns[0] }->{C3} }, $columns[2];
}
else {
$info{ $columns[0] } = { C2 =>[ $columns[1] ], C3 => [ $columns[2]] }
}
}
foreach my $c1(sort {$a<=>$b} keys %info ) {
print $c1, "\t",
join(',',@{$info{$c1}->{C2}}), "\t",
join(',',@{$info{$c1}->{C3}}), "\n";
}
__DATA__
1 a b
2 c d
4 e g
4 f h
5 x y
Upvotes: 2