Reputation: 37259

Parsing comma separated lines and calculating sum

So basically my problem can be written in pseudo-code as follows:

split the line by =
using value before =, find the next line
check this the value after = matches previous
if not, then loop till end of file
collect all the values which match and using the line numbers, get the last 2 columns value
sum all the values for a given set with equal key=value pair.

The dataset I have is as follows:

3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200, 100
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300, 10
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000, 80
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000, 1200
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000, 500
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000, 7

What I need to do is, take all values for 3, which are equal and get the summation of the last 2 columns and sum it for that value. Example:

3 = 5002, sum = 500, 110
5 = 0, sum = 1300, 90
8 = 2, sum = 15000, 1700

I have been able to parse the first 3, but am unable to do anything for the rest of the columns :-(

Upvotes: 0

Answers (5)

Leonardo Herrera

Reputation: 8406

Well, it seems that everybody is trying to understand what you really want. I don't understand it, but it seems that you only want to capture the sum of all lines that contain a given key=value pair. Except, you don't actually care for the key.

Or something like that.

So, my question would be: can you provide the expected output for the example set of data?

Anyways, here's my try (the '#/' comments are just to help the syntax highlighter.)

#!/usr/bin/perl
use strict;
use warnings;
my %h;
my @ord_keys;
while (<DATA>) {
    chomp;
    my @cols = split /,\s*/; #/
    my $val = pop @cols;

    foreach my $k (@cols) {
        if (exists($h{$k})) {
            $h{$k} += $val;
        } else {
            push @ord_keys, $k;
            $h{$k} = $val;
        }
    }
}

foreach my $key (@ord_keys) {
    my ($k, $v) = split /=/, $key; #/
    print "$k = $v, sum = $h{$key}\n";
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

And the results:

3 = 5002, sum = 500
0 = 10002, sum = 500
5 = 1, sum = 10200
4 = 1, sum = 11500
7 = 1, sum = 200
8 = 1, sum = 1200
9 = 0, sum = 2200
1 = 14002, sum = 500
6 = 5, sum = 500
5 = 0, sum = 1300
7 = 0, sum = 1300
8 = 0, sum = 1300
9 = 1, sum = 15300
3 = 5001, sum = 11000
0 = 10001, sum = 1000
4 = 0, sum = 6000
1 = 14001, sum = 11000
6 = 3, sum = 11000
0 = 10004, sum = 16000
7 = 2, sum = 16000
8 = 2, sum = 15000
3 = 5003, sum = 6000
5 = 2, sum = 5000
1 = 14003, sum = 6000
6 = 8, sum = 6000
5 = 3, sum = 1000

Comments welcome.

Upvotes: 0

Sinan Ünür

Reputation: 118118

Based on my understanding, here are two possible methods. The first one uses composite keys to store values in a single level hash. The second one uses a multi-level hash:

Method 1:

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = pop @parts;

    push @{ $data{$_} }, $value for @parts;
}

for my $col ( sort keys %data ) {
    printf("%12s:%9d\n", $col, sum @{ $data{$col} } );
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hj
  3=5001:    11000
  3=5002:      500
  3=5003:     6000
 0=10001:     1000
 0=10002:      500
 0=10004:    16000
 1=14001:    11000
 1=14002:      500
 1=14003:     6000
     4=0:     6000
     4=1:    11500
     5=0:     1300
     5=1:    10200
     5=2:     5000
     5=3:     1000
     6=3:    11000
     6=5:      500
     6=8:     6000
     7=0:     1300
     7=1:      200
     7=2:    16000
     8=0:     1300
     8=1:     1200
     8=2:    15000
     9=0:     2200
     9=1:    15300

Method: 2

#!/usr/bin/perl

use strict;
use warnings;

use List::Util qw( sum );

my %data;

while ( my $line = <DATA> ) {
    chomp $line;

    my @parts = split /, /, $line;
    last unless @parts;

    my $value = $parts[-1];

    for ( my $i = 0 ; $i < @parts - 2; ++$i ) {
        my @subparts = split /=/, $parts[$i];
        push @{ $data{$subparts[0]}->{$subparts[1]} }, $value;
    }
}

for my $k1 ( keys %data ) {
    for my $k2 ( keys %{ $data{$k1} } ) {
        printf(
            "%2d:%6d:%9d \n",
            $k1, $k2, sum @{ $data{$k1}->{$k2} }
        );
    }
}

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

C:\Temp> hjk
 3:  5003:     6000
 3:  5002:      500
 3:  5001:    11000
 7:     1:      200
 7:     0:     1300
 7:     2:    16000
 9:     1:    15300
 9:     0:     2200
 8:     1:     1200
 8:     0:     1300
 8:     2:    15000
 4:     1:    11500
 4:     0:     6000
 1: 14001:    11000
 1: 14003:     6000
 1: 14002:      500
 0: 10001:     1000
 0: 10004:    16000
 0: 10002:      500
 5:     1:    10200
 5:     3:     1000
 5:     0:     1300
 5:     2:     5000

NB: Add sort to taste.

Upvotes: 3

Alan Haggai Alavi

Reputation: 74202

I hope this is what you are looking for:

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV_XS;

my %data;
my $csv = Text::CSV_XS->new();
while ( <DATA> ) {
    $csv->parse($_);
    my @fields = $csv->fields();
    $fields[0] =~ s/^3=//;
    $data{ $fields[0] } += $fields[9];
}

use Data::Dumper;
print Dumper \%data;

__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000

Upvotes: 0

mamboking

Reputation: 4637

How about splitting on ",". Then you can pull off the last element and pair it with each element from the list. For your first line you would end up with the following pairs:

3=5002, 200
0=10002, 200
5=1, 200
4=1, 200
7=1, 200
8=1, 200
9=0, 200
1=14002, 200
6=5, 200

Add each one of those pairs to a master list. Once you get that you can sort by the first element in the pair and sum.

Upvotes: 1

dborba

Reputation: 643

The way you explained your problem is not very clear. Based on my understanding, this would be my approach:

Create a 2d array containing the different comma delimited fields maintaining the row, column structure.
Analyze each column and create a hash mapping the each data value to the rows containing it.

IE: For column one you'd have a hash
3=5002 0,1
3=5001 2,3
3=5003 4,5

Then you go through each entry of the hash and sum the last member of the rows listed for the different data.
Repeat for each column excluding the last.

Upvotes: 0

Parsing comma separated lines and calculating sum

Answers (5)

Related Questions