user1107055
user1107055

Reputation: 41

Reading in a CSV File in Perl

I have read files in Perl before, but not when the CSV file has the values I require on different lines. I assume I have to create an array mixed with hash keys but I'm out of my league here.

Basically, my CSV file has the following columns: branch, job, timePeriod, periodType, day1Value, day2Value, day3Value, day4Value, day4Value, day6Value, and day7Value.

The day* values represent the value of a periodType for each day of the week respectively.

For Example -

East,Banker,9AM-12PM,Overtime,4.25,0,0,1.25,1.5,1.5,0,0
West,Electrician,12PM-5PM,Regular,4.25,0,0,-1.25,-1.5,-1.5,0,0
North,Janitor,5PM-12AM,Variance,-4.25,0,0,-1.25,-1.5,-1.5,0,0
South,Manager,12A-9AM,Overtime,77.75,14.75,10,10,10,10,10,

Etc.

I need to output a file that takes this data and keys off of branch, job, timePeriod, and day. My output would list each periodType value for one particular day rather than one periodType value for all seven.

For example -

South,Manager,12A-9AM,77.75,14.75,16

In the line above, the last 3 values represent the three periodTypes (Overtime, Regular, and Variance) day1Values.

As you can see, my problem is I don't know how to load into memory the data in a manner which allows me to pull the data from different lines and output it successfully. I've only parsed off of singular lines before.

Upvotes: 4

Views: 4602

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 753495

Unless you like pain, use Text::CSV and its relatives Text::CSV_XS and Text::CSV_PP.

However, that may be the easier part of this problem. Once you've read and validated that the line is complete, you need to add the relevant information to the correctly keyed hashes. You're probably going to have to get rather intimately familiar with references, too.

You might create a hash %BranchData keyed by the branch. Each element of that hash would be a reference to a hash keyed by job; and each element in that would be a reference to a hash keyed by timePeriod, and each element in that would be reference to an array keyed by day number (using indexes 1..7; it over allocates space slightly, but the chances of getting it right are vastly greater; do not mess with $[ though!). And each element of the array would be a reference to a hash keyed by the three period types. Ouch!

If everything is working well, a prototypical assignment might be something like:

$BranchData{$row{branch}}->{$row{job}}->{$row{period}}->[1]->{$row{p_type}} +=
    $row{day1};

You would be iterating of elements 1..7 and 'day1' .. 'day7'; there's a bit of clean-up on the design work to do there.

You have to worry about initializing stuff correctly (or maybe you don't - Perl will do it for you). I'm assuming that the row is returned as a direct hash (rather than a hash reference), with keys for branch, job, period, period type (p_type), and each day ('day1', .. 'day7').

If you know which day you need in advance, you can avoid accumulating all days, but it may make more generalized reporting simpler to read and accumulate all the data all the time, and then simply have the printing deal with whatever subset of the entire data needs to be processed.


It was intriguing enough a problem that I've hacked together this code. I doubt if it is optimal, but it does work.

#!/usr/bin/env perl
#
# SO 8570488

use strict;
use warnings;
use Text::CSV;
use Data::Dumper;
use constant debug => 0;

my $file = "input.csv";
my $csv = Text::CSV->new({ binary => 1, eol => $/ })
                   or die "Cannot use CSV: ".Text::CSV->error_diag();
my @headings = qw( branch job period p_type day1 day2 day3 day4 day5 day6 day7 );
my @days     = qw( day0 day1 day2 day3 day4 day5 day6 day7 );
my %BranchData;

open my $in, '<', $file or die "Unable to open $file for reading ($!)";

$csv->column_names(@headings);
while (my $row = $csv->getline_hr($in))
{
    print Dumper($row) if debug;
    my %r = %$row;  # Not for efficiency; for notational compactness
    $BranchData{$r{branch}} = { } if !defined $BranchData{$r{branch}};
    my $branch = $BranchData{$r{branch}};
    $branch->{$r{job}} = { } if !defined $branch->{$r{job}};
    my $job = $branch->{$r{job}};
    $job->{$r{period}} = [ ] if !defined $job->{$r{period}};
    my $period = $job->{$r{period}};
    for my $day (1..7)
    {
        # Assume that Overtime, Regular and Variance are the only types
        # Otherwise, you need yet another level of checking whether elements exist...
        $period->[$day] = { Overtime => 0, Regular => 0, Variance => 0} if !defined $period->[$day];
        $period->[$day]->{$r{p_type}} += $r{$days[$day]};
    }
}

print Dumper(\%BranchData);

Given your sample data, the output from this is:

$VAR1 = {
    'West' => {
        'Electrician' => {
            '12PM-5PM' => [
                undef,
                {
                    'Regular'  => '4.25',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.25',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.5',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => '-1.5',
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    },
    'South' => {
        'Manager' => {
            '12A-9AM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => '77.75',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '14.75',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 10,
                    'Variance' => 0
                }
            ]
        }
    },
    'North' => {
        'Janitor' => {
            '5PM-12AM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-4.25'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.25'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.5'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => '-1.5'
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    },
    'East' => {
        'Banker' => {
            '9AM-12PM' => [
                undef,
                {
                    'Regular'  => 0,
                    'Overtime' => '4.25',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.25',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.5',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => '1.5',
                    'Variance' => 0
                },
                {
                    'Regular'  => 0,
                    'Overtime' => 0,
                    'Variance' => 0
                }
            ]
        }
    }
};

Have fun taking it from here!

Upvotes: 15

user554546
user554546

Reputation:

I don't have firsthand experience with it, but you can use DBD::CSV and then pass the relatively simple SQL query needed to compute the aggregation that you want.

If you insist on doing it the hard way, though, you can loop through and gather your data in the following hash of hash references:

(
  "branch1,job1,timeperiod1"=>
    {
      "overtime"=>"overtimeday1value1",
      "regular"=>"regulartimeday1value1",
      "variance"=>"variancetimeday1value1"
    },
  "branch2,job2,timeperiod2"=>
    {
      "overtime"=>"overtimeday1value2",
      "regular"=>"regulartimeday1value2",
      "variance"=>"variancetimeday1value2"
    },
  #etc
);

and then just loop through the keys accordingly. This approach does, however, rely on a consistent formatting of the keys (eg "East,Banker,9AM-12PM" is not the same as "East, Banker, 9AM-12PM"), so you'd have to check for consistent formatting (and enforce it) while making the hash above.

Upvotes: 4

Related Questions