Storing data into array of hashes

Question

I have a school program I just got and we are learning hashes and the teacher went over hashes of arrays but not really array of hashes and I feel like an AoH is going to work out better for me in the long run. Right now I get all my data into separate variables and I want store them into a AoH bc I have the same variables the entire time but the values change.

What the program is, is a log analyzer and parses through a gigantic log file and all the data is, is lines that look like this.

IPADDY x x [DATE:TIME -x] "METHOD URL HTTPVERS" STATUSCODE BYTES "REFERER" "USERAGENT"

example line being

27.112.105.20 - - [09/Oct/2011:07:22:51 -0500] "GET / HTTP/1.1" 200 4886 "-" "Python-urllib/2.4"

Now I get all the data fine I just dont really understand how to populate and Array of Hashes if anyone can help me out.

Here is an updated code that grabs the data and tries storing it into an AoH. The output in my file used to be perfect just like the print statments I now have commented out. This is all that comes in my output file now "ARRAY(0x2395df0): HASH(0x23d06e8)". Am I doing something wrong?

#!/usr/bin/perl
use strict;
use warnings;

my $j = 0;
my @arrofhash;
my $ipadd;
my $date;
my $time;
my $method;
my $url;
my $httpvers;
my $statuscode;
my $bytes;
my $referer;
my $useragent;
my $dateANDtime;
my ($dummy1, $dummy2, $dummy3);

open ( MYFILE, '>>dodoherty.report');

if ( @ARGV < 1)
{
        printf "\n\tUsage: $0 file word(s)\n\n";
        exit 0;
}

for (my $i = 0; $i < @ARGV; ++$i)
{
    open( HANDLE, $ARGV[$i]);
    while( my $line =  )
    {

            ($ipadd, $dummy1, $dummy2, $dateANDtime, $dummy3, $method, $url, $httpvers, $statuscode, $bytes, $referer, $useragent) = split( /\s/, $line);
            $method = substr ($method, 1, length($method));
            $httpvers = substr ($httpvers, 0, length($httpvers)-1);
            $referer = substr ($referer, 1, length($referer)-2);
            $useragent = substr ($useragent, 1, length($useragent)-1);
            if ( substr ($useragent, length($useragent)-1, length($useragent)) eq '"')
            {
                    chop $useragent;
            }
            if ( $dateANDtime =~ /\[(\S*)\:(\d{2}\:\d{2}\:\d{2})/)
            {
                    $date = $1;
                    $time = $2;
            }

            $arrofhash[$i] = {ipadd => $ipadd, date => $date, 'time' => $time, method => $method, url => $url, httpvers => $httpvers, statuscode => $statuscode, bytes => $bytes, referer => $referer, useragent => $useragent};

#               print MYFILE "IPADDY :$ipadd\n";
#               print MYFILE "METHOD :$method\n";
#               print MYFILE "URL :$url\n";
#               print MYFILE "HTTPOVERS : $httpvers\n";
#               print MYFILE "STATUS CODE: $statuscode\n";
#               print MYFILE "BYTES : $bytes\n";
#               print MYFILE "REFERER : $referer\n";
#               print MYFILE "USERAGENT : $useragent\n";
#               print MYFILE "DATE : $date\n";
#               print MYFILE "TIME : $time\n\n";

    }
}

for ( my $j = 0; $j < @arrofhash; ++$j)
{
    foreach my $hash (@hashkeys)
    {
            printf MYFILE "%s: %s\n",$hash, $arrofhash[$j];
    }
    print MYFILE "\n";
}


close (MYFILE);

TLP · Accepted Answer

A common beginner mistake is to not make use of the lexical scope of variables, and just declare all variables at the top, like you do. Declare them within the scope that you need them, no more, no less.

In your case, it would be beneficial to just store the data directly in a hash, then push that hash reference to an array. I would also advise against using split here, as it is working unreliably IMO, and you are splitting quoted strings, using dummy variables to get rid of unwanted data. Instead use a regex.

This regex won't handle escaped quotes inside quotes, but I get the feeling that you will not have to deal with that, since you were using split before to handle this.

You will need to add any further processing to the data, like extracting date and time, etc. If you want some added safety, you can add a warning if the regex seems to have failed, e.g. unless (%f) { warn "Warning: Regex did not match line: '$_'"; next; }

use strict;
use warnings;
use Data::Dumper;

my @all;
while () {
    my %f;                 # make a new hash for each line
                           # assign the regex captures to a hash slice
    @f{qw(ipadd dateANDtime method statuscode bytes referer useragent)} = 
        /^                 # at beginning of line...
            (\S+) [\s-]*   # capture non-whitespace and ignore whitespace/dash
            $$([^]]+)$$\s* # capture what's inside brackets
            "([^"]+)"\s*   # capture what's inside quotes
            (\d+)\s*       # capture digits
            (\d+)\s*
            "([^"]+)"\s*
            "([^"]+)"\s* 
        $/x;               # ..until end of line, /x for regex readability only
    push @all, \%f;        # store hash in array
}

@f{qw(date time)} = split /:/, $f{dateANDtime}, 2;
print Dumper \@all;        # show the structure you've captured

__DATA__
27.112.105.20 - - [09/Oct/2011:07:22:51 -0500] "GET / HTTP/1.1" 200 4886 "-" "Python-urllib/2.4"

Storing data into array of hashes

Answers (2)

Related Questions