user2198367
user2198367

Reputation: 57

Fetching zero cells in XLS file using Spreadsheet::ParseExcel

First refer to Stack Overflow question Using Spreadsheet::WriteExcel.

The data structure looks alike the following:

      col1      col2    col3   col4   col5
row1  School    1
row2  Dean      John
row3  No.stu.   55
row4  some irrelevant stuff
row5  School2   2
row6  Dean      Tony
row7  No. stu.  60
row8  some irrelevant stuff
row9  School    3
row10 Dean      James
row11 No.stu.   56
row12 No. teacher 20
row13 School    4
row14 Dean      Tom
row15 No.stu.   79
row16 No. teacher 21
row17 course
row18           math    2
row19           eng     4
row20 teacher   name    age   gender   race
row21           Jane    20    female   white
row22 student   name    Lee
row23           SAT     1434
row24           gender  male

The output I would like to achieve is:

      col1  col2  col3     col4          col5         col6          col7        col8       col9
row1 School Dean No.stu.  No. teacher  course_math  course_eng  teacher_name  teacher_age  teacher_gender    teacher_race    student_name   student_SAT   student_gender
row2 1      John  55
row3 2      Tony  60
row4 3      James 56       20
row5 4      Tome  79       21              2           4            Jane        20          female                white         Lee         1434          male

And thanks to gangabass, the code I have been given is that:

use strict;
use warnings;

use Spreadsheet::ParseExcel;
use FindBin qw($Bin);

my ($infile) = @ARGV;

my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();

my %data;
my $row    = 0;
my $school = "";
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value = $worksheet->get_cell($row++, 1)->value();

    if ($key eq "School") {

        $school = $value;
        next;
    }

    $data{$school}->{$key} = $value;
}
sleep 1;

I start from parsing row17-row19. The first problem I came across was the empty cell in (row17, col3). The code errors out when it arrives here. Knowing that an Excel cell differentiates between 'empty' and 'blank', I can play around with it by setting the cell format of the original XLS file to others except 'general'. However, it is just a temporary solution. I wonder if there are any commands that I could use to fetch the empty cell. I already tried unformatted() by adding:

my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();

It, however, does not work out.

I then tried to specify the data structure under 'course' state using the following code:

my %data;
my $row    = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value  = $worksheet->get_cell( $row++, 1 )->value();
    my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3

    if ($key eq "School") {

        $school = $value;
        next;
    }

    if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'

         $course = $value1;
         next;
    }

    $data{$school}->{$key} = $value;  #Must be something wrong here, but can not figure out
}

The code does not go through and gives Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';

So in short, my questions are:

  1. In general, how do I fetch empty cells in XLS without intervening the process?
  2. How do I parse a structure like following by using Spreadsheet::ParseExcel

    row17 course
    row18           math    2
    row19           eng     4
    

Upvotes: 1

Views: 1945

Answers (2)

user2198367
user2198367

Reputation: 57

Here is my homework:

I was simply adding the process for columns.

elsif ($state eq 'student') {
        my $key = $cell1->value();
        $data{$school}{$student}{$key} =$cell2->value();
}  elsif ($state eq 'course') {
       my $key = $cell1->value();
        $data{$school}{$course}{$key} =$cell2->value();

where $cell2 is defined as:

my $cell2 = $worksheet->get_cell( $row, 2 );

It gives me:

$VAR1 = {
          '4' => {
                   'course' => {
                                 'math' => '2',
                                 'eng' => '4'
                               },
                   'No.Stu' => '79',
                   'No.Teacher' => '21',
                   'Lee' => {
                              'SAT' => '1434',
                              'gender' => 'male'
                            },
                   'Dean' => 'Tom'
                 },
          '1' => {
                   'No.Stu' => '55',
                   'Dean' => 'John'
                 },
          '3' => {
                   'No.Stu' => '56',
                   'No.Teacher' => '20',
                   'Dean' => 'James'
                 },
          '2' => {
                   'No.Stu' => '60',
                   'Dean' => 'Tony'
                 }
        };

which looks good.

But I have no idea about how to process the column for teacher as it has some different structures:

        col1      col2    col3   col4   col5
  row20 teacher   name    age   gender   race
  rwo21           Jane    20    female   white

I tried to loop over the columns to fetch the cell values as keys but not succeed:

my ( $col_min, $col_max ) = $worksheet->col_range();
my $col = $col_min;
my $cell3 = $worksheet->get_cell(++$row, $col );

elsif ($state eq 'teacher') {
        while ($col <= $col_max) {
            my $key = $worksheet->get_cell($row, $col++ );
        }
           $data{$school}{$teacher}{$key} =$cell3->value();
    }

Any suggestions?

Update: I follow what @imran suggested, and it worked pretty well and then I just rewrite what I have parsed into a new XLS file using the following code:

use Spreadsheet::WriteExcel;

$workbook = Spreadsheet::WriteExcel->new('Result.xls');
$worksheet = $workbook->add_worksheet();

$col = 0;
$row = 0;

$worksheet->write( $row++, $col,
    ["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );

foreach my $school( sort keys %data ) {

     $worksheet->write( $row++, $col,
        [ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
}

$workbook->close();

It all worked well except one minor thing. The cell for teacher_age is empty, but it shows in the hash table that it has already been parsed. I can not figure that out...

The hashed teacher part is as follows:

           'teacher' => {
                          'race' => 'white',
                          'name' => 'Jane',
                          'age ' => '25',
                          'gender' => ' female'
                        },

And the code I used is exactly what you suggested, but the cell for 'teacher_age' in re-formatted xls. file is empty.

Update 2: Back to the question about 'state', does it always start parsing from the subsequent row? What if I have the following data structure:

 School    1
 course    math
           eng
 ...
 School     2
 course    phy
 ...
 School     3
 course    chem
           gym
           music

If I use the old code and define a course state, I can only parse eng from school 1, gym and music course from school 2. Well, I was thinking to define the course state one row ahead but did not get any luck. Any suggestions?

Update:

I followed @imran's suggestion by changing the course code to:

} elsif ($key eq 'course') {
            $state = 'course';
            $course = $worksheet->get_cell( $row, 1 )->value();
}

And correspondingly the hash table as:

} elsif ($state eq 'course') {
            my $key = $cell1->value();
            $data{$school}{$course}{$key} =$cell1->value();
}

However, it can only parse

 '1' => {
                   'math' => {
                               'eng' => 'eng'
                             },
       }
 '3' => {
   'chem' => {
                               'gym' => 'gym',
                               'music' => 'music
                             },
      }

And no parsing for school2.

The problem has been solved by using simply proceeding course right after it reaches the key. i.e. instead of having separate blocks of defining course keys and proceeding the course table. I now have:

  } elsif ($key eq 'course') {
            $state = 'course';
            $course = $state;
            my $key = $cell1->value();
            $data{$school}{$course}{$key} = $key;
}

And it does that trick. But I am still curious about how it works.

Upvotes: 0

imran
imran

Reputation: 1560

You are doing a $row++ when you shouldn't be.

Try this code instead as your starting point:

my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
    my $cell0 = $worksheet->get_cell( $row, 0 );
    my $cell1 = $worksheet->get_cell( $row, 1 );

    if (defined($cell0)) {
        my $key = $cell0->value();
        if ($key eq 'School') {
            $state = 'school';
            $school = $cell1->value();
        } elsif ($key eq 'course') {
            $state = 'course';
        } elsif ($key eq 'teacher') {
            $state = 'teacher';
        } elsif ($key eq 'student') {
            $state = 'student');
            $student = $worksheet->get_cell( $row, 2 )->value();
        } else {
            $data{$school}{$key} = $cell1->value();
        }
    } elsif ($state eq 'course') {
        # process columns for course
    } elsif ($state eq 'teacher') {
        # process columns for teacher
    } elsif ($state eq 'student') {
        # process columns for student
    }
    $row++;
}
use Data::Dumper;
print Dumper(\%data);

Update:

To process the teacher rows, I would first declare a %teacher_columns hash outside the first while loop and also the $col_min and $col_max using the col_range() method. Then inside the $key eq 'teacher' elsif clause, I would do something like this:

%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
    my $cell = $worksheet->get_cell( $row, $col );
    $teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}

Then in the state eq 'teacher' elsif clause, I would do something like this:

foreach my $key (keys %teacher_columns) {
    my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
    if (defined($cell)) {
      # store cell data into proper location of your data hash
    }
}

Upvotes: 2

Related Questions