David
David

Reputation:

How can I collate and summarize records from a file with Perl?

I have a text file in the following format:

211B1 CUSTOMER|UPDATE|  
211B2 CUSTOMER|UPDATE|  
211B3 CUSTOMER|UPDATE|  
211B4 CUSTOMER|UPDATE|  
211B5 CUSTOMER|UPDATE|  
567FR CUSTOMER|DELETE|  
647GI CUSTOMER|DELETE|  

I want a script that processes the text file and reports the following:

I can script simple solutions, but this seems a little complex to me and would appreciate assistance or guidance.

Upvotes: 1

Views: 436

Answers (5)

Straff
Straff

Reputation: 5749

another awk version, though does reverse order of code values, and has an extra "," at end of each line


BEGIN { FS="[ |]" }

{
        key = $3 " for column " $2
        MAP[ key ] = $1 "," MAP[ key ]
}

END {
        for ( item in MAP ) {
                print item " found for Acct's: " MAP[ item ]
        }
}

Upvotes: 0

Sinan Ünür
Sinan Ünür

Reputation: 118118

#!/usr/bin/perl

use strict;
use warnings;

my %data;

while ( my $line = <DATA> ) {
    next unless $line =~ /\S/;
    my ($acct, $col, $action) = split /\s|\|/, $line;
    push @{ $data{$action}->{$col} }, $acct;
}

for my $action ( keys %data ) {
    for my $col ( keys %{ $data{$action} } ) {
        print qq{"$action" for column $col found for acct's: },
              join q{,}, @{ $data{$action}->{$col} }, "\n";    
    }

}
__DATA__
211B1 CUSTOMER|UPDATE|  
211B2 CUSTOMER|UPDATE|  
211B3 CUSTOMER|UPDATE|  
211B4 CUSTOMER|UPDATE|  
211B5 CUSTOMER|UPDATE|  
567FR CUSTOMER|DELETE|  
647GI CUSTOMER|DELETE|

Upvotes: 1

paxdiablo
paxdiablo

Reputation: 881113

With awk:

echo '211B1 CUSTOMER|UPDATE|  
211B2 CUSTOMER|UPDATE|  
211B3 CUSTOMER|UPDATE|  
211B4 CUSTOMER|UPDATE|  
211B5 CUSTOMER|UPDATE|  
567FR CUSTOMER|DELETE|  
647GI CUSTOMER|DELETE|' | awk -F '[ |]' '
    BEGIN {
        upd="";del=""
    } {
      if ($3 == "UPDATE") {upd = upd" "$1};
      if ($3 == "DELETE") {del = del" "$1};
    } END {
        print "Updates:"upd; print "Deletes:"del
    }'

produces:

Updates: 211B1 211B2 211B3 211B4 211B5
Deletes: 567FR 647GI

It basically just breaks each line into three fields (with the -F option) and maintains a list of updates and deletes that it appends to, depending on the "command".

The BEGIN and END are run before and after all line processing so they're initialization and the final output.

I'd put it into a script to make it easier. I left it as a command line tool just since that's how I usually debug my awk scripts.

Upvotes: 1

Anonymous
Anonymous

Reputation: 50319

Based on your question, you could do this:

perl -i.bak -pe'if(/^211B[1-5]/){s/CUSTOMER/UPDATE/}elsif(/^(5675FR|6470GI)/){s/CUSTOMER/DELETE/}' filename

Though I notice now that the last two account numbers differ in the example, and also that the second column already has those values...

Upvotes: -2

j_random_hacker
j_random_hacker

Reputation: 51226

collate.pl

#!/usr/bin/perl

use strict;

my %actions;
while (<>) {
    my ($key, $fld, $action) = /^(\w+) (.+?)\|(.+?)\|/ or die "Failed on line $.!";
    push @{$actions{$action}{$fld}}, $key;
}

foreach my $action (keys %actions) {
    foreach my $fld (keys %{$actions{$action}}) {
        print "\"$action\" for column $fld found for Acct's: " . join(",", @{$actions{$action}{$fld}}), "\n";
    }
}

Use like so:

perl collate.pl < input.txt > output.txt

Upvotes: 6

Related Questions