Reputation: 2157
I have a tab-delimited file (with header line) in which I want to make the sum of some values based on the presence of data from another column.
This is how my table looks like:
C1 C2 C3 C4
a x e 1
b x 3
c y 1
d z f 4
For all the identical values in C2 I want to make the sum of the corresponding values in C3. So the new table should look like this:
C1 C2 C3 C4 C5
a x e 1 4
b x 3 4
c y 1 1
d z f 4 4
So I want to keep the original structure of my table (columns C1,C2,C3,C4 and identical number of lines), but at the end I want to add another column, where in this case for line a and b the values from C4 are summed up and put in C5, since the values for C2 are identical. The values in C5 for line c&d are the same as C4, since the y&z in C2 are unique.
Can anyone help me how to do this in Perl? I suppose I should loop through the file and create a variable $sum in which by going through each line he adds up the number of C4 and then puts in C5.
Upvotes: 0
Views: 145
Reputation: 15121
Read the data file line by line, push every line to an array, and record the value of column 4 in a hash, using column 2 as key. When you finished, iterate that array, print its element and corresponding column 4 one by one.
#!/usr/bin/perl
use strict;
use warnings;
my $header = <DATA>;
chomp $header;
$header = join "\t", (split /\t/, $header), 'C5';
my (%record, @lines);
while (<DATA>) {
chomp;
my @cols = split /\t/;
$record{$cols[1]} += $cols[3];
push @lines, \@cols;
}
print "$header\n";
foreach (@lines) {
print (join "\t", (@$_, $record{$_->[1]}), "\n");
}
__DATA__
C1 C2 C3 C4
a x e 1
b x 3
c y 1
d z f 4
Upvotes: 5
Reputation: 50637
perl -lane'
BEGIN { $, ="\t" }
print(@F, "C4"),next if $. ==1;
$s{ $F[1] } += $F[2];
push @r, [ @F ];
END {
print @$_, $s{$_->[1]} for @r;
}
' file
output
C1 C2 C3 C4
a x 1 4
b x 3 4
c y 1 1
d z 4 4
script version,
use strict;
use warnings;
local $, ="\t";
local $\ = "\n";
my %s;
my @r;
while (<>) {
chomp;
my @F = split;
print(@F, "C4"),next if $. ==1;
$s{ $F[1] } += $F[2];
push @r, [ @F ];
}
print @$_, $s{$_->[1]} for @r;
Upvotes: 1
Reputation: 45526
If you don't mind reading the file twice this can be achieved very easily using awk
as well:
$ awk -v OFS='\t' 'NR==FNR{a[$2]+=$3;next}{print $1,$2,$3,$2=="C2"?"C4":a[$2]}' t.txt t.txt
C1 C2 C3 C4
a x 1 4
b x 3 4
c y 1 1
d z 4 4
and if you don't mind fixing the column header manually the following suffices:
$ awk -v OFS='\t' 'NR==FNR{a[$2]+=$3;next}{print $1,$2,$3,a[$2]}' t.txt t.txt
C1 C2 C3 0
a x 1 4
b x 3 4
c y 1 1
d z 4 4
Upvotes: 0
Reputation: 166
One simple solution would be to put the values from C3 into an array and then print each line's contents up to C3. For column i of C4, compare the contents of array[i-1] to array[i] and array[i+1] to array[i] i.e. compare the value of the array of the particular line with that of the previous and of the next line. If they are equal, add them, print them and move to next line.
This can be easily altered for cases where C3 is equal for more than 2 consecutive values.
Upvotes: 0