jasongallant
jasongallant

Reputation: 91

Alter code to sum values from multiple files

Curious if I can get a little help here. I'm a perl newbie, and can't figure out how to convert the following code into something a bit more useful for my analysis.

This code presently takes the 1st and 4th column from a user supplied list of data files and puts them together.

What I'd like my code to do, for each row of the "current output" generated by this code (see below), is make a sum of these 4th column values (filea, fileb, filec). Not quite sure how to implement this...

Current Output:

filea   fileb  filec

entrya  | 0 |10.2 | 0
entryb  | 0 | 0.0 | 1     
entryc  | 8 | 57.0| 46       

desired output

         sum
entrya | 10.2
entryb | 1
entryc | 111

current code looks like this:

main: {


my %data;

foreach my $file (@rsem_files) {

    open (my $fh, $file) or die "Error, cannot open file $file";
    my $header = <$fh>; # ignore it
    while (<$fh>) {
        chomp;
        my @x = split(/\t/);
        my $acc = $x[0];
        my $count = $x[4];
        $data{$acc}->{$file} = $count;
    }
    close $fh;
}

my @filenames = @rsem_files;
foreach my $file (@filenames) {
    $file = basename($file);
}


print join("\t", "", @filenames) . "\n";
foreach my $acc (keys %data) {

    print "$acc";

    foreach my $file (@rsem_files) {

        my $count = $data{$acc}->{$file};
        unless (defined $count) {
            $count = "NA";
        }

        print "\t$count";

    }

    print "\n";

}


exit(0);
}

Upvotes: 1

Views: 122

Answers (2)

chimpsarehungry
chimpsarehungry

Reputation: 1821

foreach $line(@rsemfiles) {
    if ($line=~ /^entry/) {   
    #match the line starting with the word entry
    my $entry=$1; my $filea=$2; my $fileb=$3; my $filec=$4;  
    # make variables out of the column values

Now that you have these variables, you can do math on them.

Upvotes: 0

Phil H
Phil H

Reputation: 20131

Alter the @rsemfiles loop:

# create $total variable outside loop
my $total = 0; 
foreach my $file (@rsem_files) {
    my $count = $data{$acc}->{$file};
    # change unless to if, no need for NA
    if (defined $count) {   
        $total += $count;
    }
}
# move print outside loop so it happens once instead of per-file
print '\t$total\n'; 

Upvotes: 1

Related Questions