tman
tman

Reputation: 425

perl hash of hash of arrays with looping

I have the following data in a hash of hash of arrays. The numerical data in the array represents that last 4 quarters of financial information.

I'd like to be able to iterate over the array, and pull out the data by quarter, to get it ready to insert into a database. With my code below I can get all of the quarters, or one quarter only if I specifically call it. When I try and add another loop to iterate over the hash of hash of arrays to return only the array subset values, i get all the values and I don't know what I'm doing wrong. See code: Thanks for the help

my %comp_info = (
  CompanyA => {
    COGS    => ["175.00", "155.00", "125.00", "125.00"],
    Revenue => ["300.00", "200.00", "250.00", "225.00"],
  },
)

# The following works, but I have to specifically push one array subset at a time, 
# which makes passing values to db_insert subroutine ineffective.  
# (Id have to have 4 or 5 calls depending on number of quarters of data in each record).

sub get_insert_arrays {
  foreach $comp (keys %comp_info ) {
     foreach $column ( keys %{$comp_info{$comp}} ) {
        push (@insert_array, @{$comp_info{$sym}{$column}}[0] );
     }
  }
  my $valuelist = join(", ", @insert_array);
  &db_insert($valuelist);
  undef @insert_array;
}

#try to loop through, fails, I get all of the data in @insert_array instead of only one quarter.
sub get_insert_arrays {
  foreach $comp (keys %comp_info ) {
     foreach $column ( keys %{$comp_info{$comp}} ) {
        for my $i ( 0 .. $#{$comp_info{$comp}{$column}} ) {
           push (@insert_array, @{$comp_info{$sym}{$column}}[$i] ); 
        }
     }
    my $valuelist = join(", ", @insert_array);
    &db_insert($valuelist);
    undef @insert_array;  
    undef $valuelist;   
  }
}

Upvotes: 1

Views: 212

Answers (2)

Borodin
Borodin

Reputation: 126722

Your recent addition - undef @insert_array; undef $valuelist; is a misuse of undef. Putting it before a variable like that forces a garbage collection cycle, which is something you don't want to do -- it is best to let Perl look after things itself.

Arrays should be emptied with @insert_array = () rather than using undef, and for scalars you should $valuelist = undef. But these variables are irrelevant outside the subroutine, so you should declare them inside, in which case there is no need to reinitialise them in the first place.

Please bear in mind what I said about calling prepare on an SQL statement with placeholders. Your code should look something like this

my $insert = $dbh->prepare('INSERT INTO table VALUES (?, ?)');

and later

my @insert_array = (175.00, 300.00);
$insert->execute(@insert_array);

However I have written this, which I think does what you want, to create a $valuelist string as your own code does. Since you don't need the hash keys, it is much tidier to iterate over the values instead. The db_insert subroutine is a dummy that just prints the values of the parameter passed to it.

use strict;
use warnings;
use 5.010;

my %comp_info = (
  CompanyA => {
    COGS    => ["175.00", "155.00", "125.00", "125.00"],
    Revenue => ["300.00", "200.00", "250.00", "225.00"],
  },
);

my @values = map values %$_, values %comp_info;

for my $i (0 .. $#{$values[0]}) {
  my @insert = map $_->[$i], @values;
  db_insert(join ', ', @insert);
}

sub db_insert {
  say "db_insert('@_')";
}

output

db_insert('175.00, 300.00')
db_insert('155.00, 200.00')
db_insert('125.00, 250.00')
db_insert('125.00, 225.00')

Update

To comply with the new specification:

use strict;
use warnings;
use 5.010;

my %comp_info = (
  CompanyA => {
    COGS    => ["175.00", "155.00", "125.00", "125.00"],
    Revenue => ["300.00", "200.00", "250.00", "225.00"],
  },
  CompanyB => {
    COGS    => ["175.00", "155.00", "125.00", "125.00"],
    Revenue => ["300.00", "200.00", "250.00", "225.00"],
  },
  CompanyC => {
    COGS    => ["175.00", "155.00", "125.00", "125.00"],
    Revenue => ["300.00", "200.00", "250.00", "225.00"],
  },
);

my @columns = qw/ COGS Revenue /;

for my $comp (keys %comp_info) {
  my $data = $comp_info{$comp};
  for my $i (0 .. $#{(values %$data)[0]}) {
    my @values = ( $comp, map $_->[$i], @{$data}{@columns} );
    db_insert(join ', ', @values);
  }
}

sub db_insert {
  say "db_insert('@_')";
}

output

db_insert('CompanyC, 175.00, 300.00')
db_insert('CompanyC, 155.00, 200.00')
db_insert('CompanyC, 125.00, 250.00')
db_insert('CompanyC, 125.00, 225.00')
db_insert('CompanyA, 175.00, 300.00')
db_insert('CompanyA, 155.00, 200.00')
db_insert('CompanyA, 125.00, 250.00')
db_insert('CompanyA, 125.00, 225.00')
db_insert('CompanyB, 175.00, 300.00')
db_insert('CompanyB, 155.00, 200.00')
db_insert('CompanyB, 125.00, 250.00')
db_insert('CompanyB, 125.00, 225.00')

Upvotes: 1

David W.
David W.

Reputation: 107040

I highly recommend to dereference with intermediate variables, and also using the -> syntax. Both of these help you figure out what's going on:

Here's your first subroutine using dereferencing:

sub get_insert_arrays {
    my @insert_array;

    foreach $comp (keys %comp_info ) {      # References to a hash (Revenue, COGS)
        %columns = %{ $comp_info{$comp} };  # Dereference
        foreach $column ( keys %columns ) { # Reference to an Array (quarter data)
            my @values = @{ $column } ;     # Dereference
            push (@insert_array, $column );
            my $valuelist = join(", ", @insert_array);
            &db_insert($valuelist);
        }
    }
}

Hmm... Looking at this, it's easy to see that I can simply do:

    for my $comp (keys %comp_info ) {      # References to a hash (Revenue, COGS)
        %columns = %{ $comp_info{$comp} };  # Dereference
        for my $column ( keys %columns ) { # Reference to an Array (quarter data)
            my @values = @{ $column } ;     # Dereference
            db_insert(@values);
        }
    }
}

If you need to see a particular piece of data, use the -> syntax to simplify your structure:

${$comp_info{$sym}{$column}}[$i];   # You had "@". Should be "$".

vs.

$comp_info{$sym}->{$column}->[$i];

Much easier to read.

Also use the warnings and strict pragmas in your program. It'll catch a lot of errors that may include undefined variables, and misspelled variable names.

If you're pulling out data quarter-by-quarter, you probably want the COGS to be with the Revenue column:

#! /usr/bin/env perl
#

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);

#
# Just initializing the data you had
#
my %comp_info = ( CompanyA => {
        Revenue => [
            300.00, 200.00, 250.00, 225.00
        ],
        COGS => [
            175.00, 155.00, 125.00, 125.00
        ],
    },
);

#
# Go through each company
#
for my $company ( keys %comp_info ) {
    my @revenues = @{ $comp_info{$company}->{Revenue} };  # Dereference
    my @cogss    = @{ $comp_info{$company}->{COGS} };     # Dereferenec
    say "Company: $company";
    #
    # I know the keys are "Revenue" and "COGS", so I don't need a loop.
    # I'll just go right to my quarters data. Note that dereferencing
    # makes my program a lot easier to write and maintain
    #
    for my $quarter ( (0..3) ) {
        my $revenue = $revenues[$quarter];
        my $cogs    = $cogss[$quarter];
        my $profit = $revenue - $cogs;
        say "    Quarter: " . ($quarter - 1)
             . " Revenue = $revenue  COGS = $cogs Profit = $profit";
    }
}

How you do your database inserts is up to you. But, you can see how doing a bit of dereferencing and using -> can clarify what you're looking at.


Addendum

how do I pull out just the quarter by quarter data without having to specify revenue, cogs, etc. in some cases there could be 30+ fields, so I don't want to have to specify each field in the program. I just want to grab all Q1 fields, insert, grab all Q2 field, insert, etc.

So two loops:

  • Like before, I have my outer loop go through for each company which just happens to be the key for that %comp_info hash.
  • Each value in the %comp_info hash is a reference to another hash that is keyed by the data type (COGS, Revenue, etc.). Again, I simply loop through the keys of that inner hash (after dereferencing to make it easier to understand).
  • Now that I have the company name (the key to that %comp_info hash, and a list of the keys in that inner hash, I can simply pull up the first quarter numbers for each company and each data type. Getting the quarter value is simple: $comp_info{$company}->{$type}->[$quarter]. Note that there is three levels of data, and I have three sections in my variable with each section separated by ->.
    • I can see that the outer most section is a simple hash that's keyed by the company name: ($comp_info{$company}).
    • This %comp_info hash points to a hash reference (->{type}) which is keyed by the data types (COGS, Revenue, etc.).
    • And that hash reference points to an array reference for each quarter (->[$quarter]). See how that works and why I like that -> syntax? It makes it very clear what I am working with.
    • This is just the first quarter results. If I wanted to go through each and every quarter, I could have an outer loop for my $quarter (0..3) {.

Here's what it looks like. This is a complete program, so you could cut this out, and try running it yourself and see if you can figure out what's going on.

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);

my $quarter = 0;        #First Quarter is 0. Last quarter is 3
my %comp_info = ( CompanyA => {
        Revenue => [
            300.00, 200.00, 250.00, 225.00
        ],
        COGS => [
            175.00, 155.00, 125.00, 125.00
        ],
    },
);

for my $company ( keys %comp_info ) {
    say "Company: $company";
    %types = %{ $company_info{$company} };
    for my $type ( keys %types ) {   # COGS, Revenue, etc.
        say "   $type for quarter " 
            . ($quarter + 1) . ": "
            . $comp_info{$company}->{$type}->[$quarter];
    }
}

One more time for each quarter to insert the data into your database:

Using use strict and declaring variables with my means that variables are only valid for a limited scope. That my @type_data; declares a array that holds my type values for inserting into your database. However, since it's declared inside that for my $quarter loop, the array and its values disappears with each iteration of the loop. No need to have to remove the data or reinitialize the variable. It does it all by itself!

Look up how Lexically Scoped Variables work.

for my $quarter ( (0..3) ) {
    my @type_values;
    for my $company ( keys %comp_info ) {
        my %types = %{ $comp_info{$company} };
        for my $type ( keys %types ) {   # COGS, Revenue, etc.
            push @type_values, $comp_info->{$company}->{$type}->{quarter};
    }
    insert_data( @type_values );  # Database insert you wanted
}

Upvotes: 1

Related Questions