Chris
Chris

Reputation: 43

Perl - Summarize Data in File

Whats the best way to summarize data from a file that has around 2 million records in Perl?

For eg: A file like this,

ABC|XYZ|DEF|EGH|100

ABC|XYZ|DEF|FGH|200

SDF|GHT|WWW|RTY|1000

SDF|GHT|WWW|TYU|2000

Needs to be summarized on the first 3 columns like this,

ABC|XYZ|DEF|300

SDF|GHT|WWW|3000

Chris

Upvotes: 4

Views: 1203

Answers (5)

Axeman
Axeman

Reputation: 29844

1-2-3-4 I declare A CODE-GOLF WAR!!! (Okay, a reasonably readable code-golf dust-up.)

my %sums;
m/([^|]+\|[^|]+\|[^|]+).*?\|(\d+)/ and $sums{ $1 } += $2 while <>;
print join( "\n", ( map { "$_|$sums{$_}" } sort keys %sums ), '' );

Upvotes: 0

tschubring
tschubring

Reputation: 14

Sort to put all records with the same first 3 triplets next to each other. Iterate through and kick out a subtotal when a different set of triplets appears.

$prevKey="";
$subtotal=0;
open(INPUTFILE, "<$inFile");
@lines=<INPUTFILE>;
close (INPUTFILE);
open(OUTFILE, ">$outFile");
@sorted=sort(@lines);
foreach $line(@lines){
    @parts=split(/\|/g, $line);
    $value=pop(@parts);
    $value-=0; #coerce string to a number
    $key=$parts[0]."|".$parts[1]."|".$parts[2];
    if($key ne $prevKey){
        print OUTFILE "$prevKey|$subtotal\n";
        $prevKey=$key;
        $subtotal=0;
        }
    $subtotal+=$value;
    }
close(OUTFILE);

If sorting 2 million chokes your box then you may have to put each record into a file based on the group and then do the subtotal for each file.

Upvotes: -1

sorpigal
sorpigal

Reputation: 26086

Presuming your input file has its records in separate lines.

perl -n -e 'chomp;@a=split/\|/;$h{join"|",splice@a,0,3}+=pop@a;END{print map{"$_: $h{$_}\n"}keys%h}' < inputfile

Upvotes: 0

Platinum Azure
Platinum Azure

Reputation: 46183

Assuming there are always five columns, the fifth of which is numeric, and you always want the first three columns to be the key...

use warnings;
use strict;

my %totals_hash;

while (<>)
{
  chomp;
  my @cols = split /\|/;

  my $key = join '|', @cols[0..2];

  $totals_hash{$key} += $cols[4];
}

foreach (sort keys %totals_hash)
{
  print $_, '|', $totals_hash{$_}, "\n";
}

Upvotes: 3

codaddict
codaddict

Reputation: 454930

You can use a hash as:

my %hash;
while (<DATA>) {
        chomp;
        my @tmp = split/\|/;     # split each line on |
        my $value = pop @tmp;    # last ele is the value
        pop @tmp;                # pop unwanted entry
        my $key = join '|',@tmp; # join the remaining ele to form key

        $hash{$key} += $value;   # add value for this key
}

# print hash key-values.
for(sort keys %hash) {
        print $_ . '|'.$hash{$_}."\n";
}

Ideone link

Upvotes: 2

Related Questions