Reputation: 31

Counting element in occurrence counting

Here I got a text file contains some subject results. What is the possible way to do a counting table as below in Perl?

result.txt

Math Peter pass
English Peter pass
Music Peter fail
Science Peter fail
Art Mary fail
Music Mary fail
English Mary fail
Math Bob pass
English Bob fail
Art Bob pass
Music Bob fail
English Mike pass
Science Mike pass

Output

name    pass    fail
Peter   2   2
Mary    0   3
Bob 2   2
Mike    2   0

I have already tried this and can successfully print a dump in spited form

#!/usr/bin/perl

use strict;

use Data::Dumper;

my $CurrentPath = "/tmp";

open(FILE, "/tmp/result.txt") or die("Cannot open file result.txt for reading: $!");
my @results = <FILE>;
s{^\s+|\s+$}{}g foreach @results;
close FILE;

my @data_split = ();

foreach my $result ( @results ) {
    push @data_split, [ split /\s+/, $result ];
}

print Dumper \@data_split;

1;

output

$VAR1 = [
          [
            'Math',
            'Peter',
            'pass'
          ],
          [
            'Eng',
            'Peter',
            'pass'
          ],
          ...............

Upvotes: 1

Answers (2)

zdim

Reputation: 66964

To count and manage a range of items, and for any structured information, hashes are very useful. With them and arrays you can via references build data structures that can encode rather complex relationships. See perldsc. Once these become unwieldy the next step is to write a class.

It can go along these lines

use warnings;
use strict;
use feature 'say';   
use Data::Dump qw(dd);

my $file = shift @ARGV;
die "Usage: $0 filename\n" if not $file;

open my $fh, '<', $file or die "Can't open $file: $!";

my %results;

while (<$fh>) {
    next if not /\S/;  # skip empty lines

    my ($subj, $name, $grade) = split;

    if (not $subj or not $name or not defined $grade) {
       warn "Incomplete data, line: $_";
       next;
    }

    if ($grade eq 'pass') {
        $results{$name}->{pass}++;
    }  
    elsif ($grade eq 'fail') {
        $results{$name}->{fail}++;
    }  
    else { 
        warn "Unknown grade format for $name in $subj: $grade";
        next;
    }
}
dd \%results;

Names without a single occurrence of a particular score stay without the hashref for that score. If you need those entries then post-process %results to add them, for example

foreach my $name (keys %results) {
    $results{$name}->{pass} = 0 if not exists $results{$name}->{pass};
    $results{$name}->{fail} = 0 if not exists $results{$name}->{fail};
}

Or, add a statement to initialize both scores for each record (name) in the code.

Note that we can enlarge this data structure to store more information ($subj for instance), as the need arises, cleanly and with small code changes. This is another benefit of using hashes.

A few comments on the posted code

Why is there no use warnings; at the beginning? You must have that; it is directly useful
Use lexical filehandles, open my $fh, '<', $file ... instead of globs (FILE)
Process files a line at a time unless there is a specific reason to read the whole thing first
You always want to check input against what you expect it to be; what exactly that is depends on your problem and designs. In the code above all fields must be defined and sensible (grade is allowed a 0), while you may in fact accept a non-existent grade; adjust as suitable.

Of course, if all files that this program will ever read were like what you show, always with all fields and only pass/fail grade, then there would be no need to check input
The pattern /\s+/ in split should almost always be replaced with ' ', which is the same but it also discards leading spaces. It is also the default, along with $_ for the string, thus just split; above (the string it splits by ' ' is $_)
You can built a data structure like the one used here with what you got as well. However I don't see a benefit of an array of arrays here

The increment of the appropriate counter can be written more concisely

while (<$fh>) {
    next if not /\S/;
    my ($subj, $name, $grade) = split;
    # check input ...

    if ($grade !~ /^(?:pass|fail)$/) {
        warn "Unknown grade format for $name in $subj: $grade";
        next;
    }   

    $results{$name}->{$grade}++;
}

If you'd rather have your code quietly accept anything at the third field and store it in %results with its count then remove the check against pass|fail.

Upvotes: 2

Borodin

Reputation: 126772

This is a very basic solution that does no checking of the input data. It initialises each hash value to { pass => 0, fail => 0 } so that there are no "missing" values that need to be defaulted

Note that Perl hashes are unordered, so the order of the output is also indeterminate. If you need anything specific then you must say so

use strict;
use warnings 'all';
use feature 'say';

open my $fh, '<', 'results.txt' or die $!;

my %grades;

while ( <$fh> ) {

    my ($class, $name, $grade) = split;

    $grades{$name} //= { pass => 0, fail => 0 };
    ++$grades{$name}{$grade}
}

say "name\tpass\tfail";

for ( keys %grades ) {
    say join "\t", $_, @{ $grades{$_} }{qw/ pass fail /};
}

output

name    pass    fail
Mary    0   3
Bob 2   2
Mike    2   0
Peter   2   2

Upvotes: -1

Counting element in occurrence counting

result.txt

Output

output

Answers (2)

output

Related Questions