Reputation: 37259
So basically my problem can be written in pseudo-code as follows:
split the line by =
using value before =, find the next line
check this the value after = matches previous
if not, then loop till end of file
collect all the values which match and using the line numbers, get the last 2 columns value
sum all the values for a given set with equal key=value pair.
The dataset I have is as follows:
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200, 100
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300, 10
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000, 80
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000, 1200
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000, 500
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000, 7
What I need to do is, take all values for 3, which are equal and get the summation of the last 2 columns and sum it for that value. Example:
3 = 5002, sum = 500, 110
5 = 0, sum = 1300, 90
8 = 2, sum = 15000, 1700
I have been able to parse the first 3, but am unable to do anything for the rest of the columns :-(
Upvotes: 0
Views: 867
Reputation: 8406
Well, it seems that everybody is trying to understand what you really want. I don't understand it, but it seems that you only want to capture the sum of all lines that contain a given key=value pair. Except, you don't actually care for the key.
Or something like that.
So, my question would be: can you provide the expected output for the example set of data?
Anyways, here's my try (the '#/' comments are just to help the syntax highlighter.)
#!/usr/bin/perl
use strict;
use warnings;
my %h;
my @ord_keys;
while (<DATA>) {
chomp;
my @cols = split /,\s*/; #/
my $val = pop @cols;
foreach my $k (@cols) {
if (exists($h{$k})) {
$h{$k} += $val;
} else {
push @ord_keys, $k;
$h{$k} = $val;
}
}
}
foreach my $key (@ord_keys) {
my ($k, $v) = split /=/, $key; #/
print "$k = $v, sum = $h{$key}\n";
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
And the results:
3 = 5002, sum = 500
0 = 10002, sum = 500
5 = 1, sum = 10200
4 = 1, sum = 11500
7 = 1, sum = 200
8 = 1, sum = 1200
9 = 0, sum = 2200
1 = 14002, sum = 500
6 = 5, sum = 500
5 = 0, sum = 1300
7 = 0, sum = 1300
8 = 0, sum = 1300
9 = 1, sum = 15300
3 = 5001, sum = 11000
0 = 10001, sum = 1000
4 = 0, sum = 6000
1 = 14001, sum = 11000
6 = 3, sum = 11000
0 = 10004, sum = 16000
7 = 2, sum = 16000
8 = 2, sum = 15000
3 = 5003, sum = 6000
5 = 2, sum = 5000
1 = 14003, sum = 6000
6 = 8, sum = 6000
5 = 3, sum = 1000
Comments welcome.
Upvotes: 0
Reputation: 118118
Based on my understanding, here are two possible methods. The first one uses composite keys to store values in a single level hash. The second one uses a multi-level hash:
Method 1:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( sum );
my %data;
while ( my $line = <DATA> ) {
chomp $line;
my @parts = split /, /, $line;
last unless @parts;
my $value = pop @parts;
push @{ $data{$_} }, $value for @parts;
}
for my $col ( sort keys %data ) {
printf("%12s:%9d\n", $col, sum @{ $data{$col} } );
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
C:\Temp> hj
3=5001: 11000
3=5002: 500
3=5003: 6000
0=10001: 1000
0=10002: 500
0=10004: 16000
1=14001: 11000
1=14002: 500
1=14003: 6000
4=0: 6000
4=1: 11500
5=0: 1300
5=1: 10200
5=2: 5000
5=3: 1000
6=3: 11000
6=5: 500
6=8: 6000
7=0: 1300
7=1: 200
7=2: 16000
8=0: 1300
8=1: 1200
8=2: 15000
9=0: 2200
9=1: 15300
Method: 2
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw( sum );
my %data;
while ( my $line = <DATA> ) {
chomp $line;
my @parts = split /, /, $line;
last unless @parts;
my $value = $parts[-1];
for ( my $i = 0 ; $i < @parts - 2; ++$i ) {
my @subparts = split /=/, $parts[$i];
push @{ $data{$subparts[0]}->{$subparts[1]} }, $value;
}
}
for my $k1 ( keys %data ) {
for my $k2 ( keys %{ $data{$k1} } ) {
printf(
"%2d:%6d:%9d \n",
$k1, $k2, sum @{ $data{$k1}->{$k2} }
);
}
}
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
C:\Temp> hjk
3: 5003: 6000
3: 5002: 500
3: 5001: 11000
7: 1: 200
7: 0: 1300
7: 2: 16000
9: 1: 15300
9: 0: 2200
8: 1: 1200
8: 0: 1300
8: 2: 15000
4: 1: 11500
4: 0: 6000
1: 14001: 11000
1: 14003: 6000
1: 14002: 500
0: 10001: 1000
0: 10004: 16000
0: 10002: 500
5: 1: 10200
5: 3: 1000
5: 0: 1300
5: 2: 5000
NB: Add sort
to taste.
Upvotes: 3
Reputation: 74202
I hope this is what you are looking for:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
my %data;
my $csv = Text::CSV_XS->new();
while ( <DATA> ) {
$csv->parse($_);
my @fields = $csv->fields();
$fields[0] =~ s/^3=//;
$data{ $fields[0] } += $fields[9];
}
use Data::Dumper;
print Dumper \%data;
__DATA__
3=5002, 0=10002, 5=1, 4=1, 7=1, 8=1, 9=0, 1=14002, 6=5, 200
3=5002, 0=10002, 5=0, 4=1, 7=0, 8=0, 9=1, 1=14002, 6=5, 300
3=5001, 0=10001, 5=0, 4=0, 7=0, 8=0, 9=0, 1=14001, 6=3, 1000
3=5001, 0=10004, 5=1, 4=1, 7=2, 8=2, 9=1, 1=14001, 6=3, 10000
3=5003, 0=10004, 5=2, 4=0, 7=2, 8=2, 9=1, 1=14003, 6=8, 5000
3=5003, 0=10004, 5=3, 4=1, 7=2, 8=1, 9=0, 1=14003, 6=8, 1000
Upvotes: 0
Reputation: 4637
How about splitting on ",". Then you can pull off the last element and pair it with each element from the list. For your first line you would end up with the following pairs:
3=5002, 200
0=10002, 200
5=1, 200
4=1, 200
7=1, 200
8=1, 200
9=0, 200
1=14002, 200
6=5, 200
Add each one of those pairs to a master list. Once you get that you can sort by the first element in the pair and sum.
Upvotes: 1
Reputation: 643
The way you explained your problem is not very clear. Based on my understanding, this would be my approach:
Create a 2d array containing the different comma delimited fields maintaining the row, column structure.
Analyze each column and create a hash mapping the each data value to the rows containing it.
IE: For column one you'd have a hash
3=5002 0,1
3=5001 2,3
3=5003 4,5
Then you go through each entry of the hash and sum the last member of the rows listed for the different data.
Repeat for each column excluding the last.
Upvotes: 0