Reputation: 314

Unable to get desired hash for CSV format in perl

I have exported most common emp table as CSV. I would like to take each column name as hash key and store the values in an array.

Below is the code

 use Data::Dumper;
    open($fh,"<EMP.csv");
    %hash = ();

    local $/= "\n";

    while(<$fh>){

         @columnNames = split(/,/,$_) if $. ==1;     
         @columnValues = split(/,/,$_);  
          push @{hash->{@columnNames}} ,@columnValues;       
    }

    print Dumper(\%hash);

when I try to print the hash I am getting this

$VAR1 = {
          '8' => [
                   '"EMPNO"',
                   '"ENAME"',
                   '"JOB"',
                   '"MGR"',
                   '"HIREDATE"',
                   '"SAL"',
                   '"COMM"',
                   '"DEPTNO"
',
                   '"7839"',
                   '"KING"',
                   '"PRESIDENT"',
                   '""',
                   '"11/17/1981"',
                   '"5000"',
                   '""',
                   '"10"
',

But I'm expecting this instead

$VAR1 = { '"EMPNO"'=>[12,3,4,5,6,7,8,9],
          '"EMPNAME"'=>["pavan","kumar"...],

};

Upvotes: 4

Answers (3)

David W.

Reputation: 107040

Here's your problem:

push @{hash->{@columnNames}} ,@columnValues;

You're trying to use @columnNames as a key in your hash. Perl automatically takes this in a scalar context, thus, giving you a key of 8 since there are eight values in the array.

What you want to do is treat the first row in your CSV (which contains the column names) as special since these will be keys to your array.

my @column_names = read_csv_row; #Column names
my %employee_hash;
for my $column ( @column_names ) {
    $employee_hash{$column} = [];
}

This will give you a hash, keyed by column name to references to arrays. You now have to read in each line of your CSV table, and push each field into its correct column hash;

while ( my @employee_fields = read_csv_row ) {   #Your employee record is in @employee
    for my $field_num ( 0..$#employee_fields) {
        push @{ $employee_hash{$column_names[$field_num] }, $employee_fields[$field_num];
    }
}

What this is doing is taking each field from the CSV row and pushing it into its correct array reference in the %employee_hash. I'm taking advantage that @column_names is in the same order as each row. Thus, $column_names[$field_number] is the correct hash key, and that should correspond to the $employee_fields[$field_num].

However, the structure you said in your post is probably not what you really want. What you want is something like this:

%VAR = {
           7839 =>   {
                        ENAME    => "KING",
                        JOB      => "PRESIDENT",
                        MGR      => "",
                        HIREDATE => "11/17/1981",
                        SAL      => "5000",
                        COMM     => "",
                        DEPTNO   => "10",
                     }
      }

This wil key each employee by their employee number, and all of the related employee fields will be part of that value. Then you can talk about Employee number 7839's job title as being $employee{7839}->{JOB} and that employee's name being $employee{7839}->{NAME}. This way, all of the information about each employee is together in a single record:

use warnings;
use strict;
use Data::Dumper;
use feature qw(say);

my @column_names = read_csv_row(); #Column name
my %employee_hash;
while ( my @minion_fields = read_csv_row() ) {   #Your employee record is in @employee
    my %minion_hash;
    my $minion_number = $minion_fields[0];
    for my $field_num ( 1..$#minion_fields) {
        $minion_hash{ $column_names[$field_num] } = $minion_fields[$field_num];
    }
    $employee_hash{$minion_number} = \%minion_hash;
}

sub read_csv_row {
    my $row = <DATA> or return;
    chomp $row;
    return split /,\s+/, $row;
}

say Dumper \%employee_hash;
__DATA__
empno, name, job, mgr, hiredate, sal, comm, deptno
7839, king, president, , 11/17/1981, 5000, , 10
1234, prince, vice-president, , 10/1/1980, 3000, , 10

~~By the way, I haven't tested this code yet. (I'll do that now, and make needed corrections).~~ You may want to play around with Text::CSV which will be a better way of reading in CSV files, and may even help you create these structures (I haven't used it in a long, long time, so I don't remember everything it does). However, I believe that you'll find making your employee structure a hash of hashes with the initial hash keyed by employee number, and the sub-hashes keyed by fields a lot better than a hash of arrays.

Upvotes: 1

TLP

Reputation: 67908

You are trying to use a slice in a push statement, and that will not work. The array will be in scalar context, which is why you see the key 8. You would need to loop over the keys to push values onto the arrays. However, why do it that way?

You can use the Text::CSV module for this, which is rather simple and probably more appropriate, assuming you have a real csv format.

use strict;
use warnings;
use Data::Dumper;
use Text::CSV;

my $csv = Text::CSV->new({
        binary  => 1,
        eol     => $/,
    });
my %data;
open my $fh, "<", "yourfile.csv" or die $!;
$csv->column_names ($csv->getline($fh));          # get header names

while (my $row = $csv->getline_hr($fh)) {         # get hashref with values
    for my $key (keys %$row) {
        push @{$data{$key}}, $row->{$key};        # store values
    }
}
print Dumper \%data;

Upvotes: 5

gangabass

Reputation: 10666

Rewrite this line

push @{hash->{@columnNames}} ,@columnValues;

to:

foreach my $columnName (@columnNames) {
    my $columnValue = shift @columnValues;
    push @{ $hash{$columnName} }, $columnValue;
}

Upvotes: 1

Unable to get desired hash for CSV format in perl

Answers (3)

Related Questions