Reputation: 57
First refer to Stack Overflow question Using Spreadsheet::WriteExcel.
The data structure looks alike the following:
col1 col2 col3 col4 col5
row1 School 1
row2 Dean John
row3 No.stu. 55
row4 some irrelevant stuff
row5 School2 2
row6 Dean Tony
row7 No. stu. 60
row8 some irrelevant stuff
row9 School 3
row10 Dean James
row11 No.stu. 56
row12 No. teacher 20
row13 School 4
row14 Dean Tom
row15 No.stu. 79
row16 No. teacher 21
row17 course
row18 math 2
row19 eng 4
row20 teacher name age gender race
row21 Jane 20 female white
row22 student name Lee
row23 SAT 1434
row24 gender male
The output I would like to achieve is:
col1 col2 col3 col4 col5 col6 col7 col8 col9
row1 School Dean No.stu. No. teacher course_math course_eng teacher_name teacher_age teacher_gender teacher_race student_name student_SAT student_gender
row2 1 John 55
row3 2 Tony 60
row4 3 James 56 20
row5 4 Tome 79 21 2 4 Jane 20 female white Lee 1434 male
And thanks to gangabass, the code I have been given is that:
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use FindBin qw($Bin);
my ($infile) = @ARGV;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();
my %data;
my $row = 0;
my $school = "";
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell($row++, 1)->value();
if ($key eq "School") {
$school = $value;
next;
}
$data{$school}->{$key} = $value;
}
sleep 1;
I start from parsing row17-row19. The first problem I came across was the empty cell in (row17, col3). The code errors out when it arrives here. Knowing that an Excel cell differentiates between 'empty' and 'blank', I can play around with it by setting the cell format of the original XLS file to others except 'general'. However, it is just a temporary solution. I wonder if there are any commands that I could use to fetch the empty cell. I already tried unformatted()
by adding:
my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();
It, however, does not work out.
I then tried to specify the data structure under 'course' state using the following code:
my %data;
my $row = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
my $cell = $worksheet->get_cell($row, 0);
last unless defined($cell);
my $key = $cell->value();
my $value = $worksheet->get_cell( $row++, 1 )->value();
my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3
if ($key eq "School") {
$school = $value;
next;
}
if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'
$course = $value1;
next;
}
$data{$school}->{$key} = $value; #Must be something wrong here, but can not figure out
}
The code does not go through and gives Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';
So in short, my questions are:
How do I parse a structure like following by using Spreadsheet::ParseExcel
row17 course
row18 math 2
row19 eng 4
Upvotes: 1
Views: 1945
Reputation: 57
Here is my homework:
I was simply adding the process for columns.
elsif ($state eq 'student') {
my $key = $cell1->value();
$data{$school}{$student}{$key} =$cell2->value();
} elsif ($state eq 'course') {
my $key = $cell1->value();
$data{$school}{$course}{$key} =$cell2->value();
where $cell2
is defined as:
my $cell2 = $worksheet->get_cell( $row, 2 );
It gives me:
$VAR1 = {
'4' => {
'course' => {
'math' => '2',
'eng' => '4'
},
'No.Stu' => '79',
'No.Teacher' => '21',
'Lee' => {
'SAT' => '1434',
'gender' => 'male'
},
'Dean' => 'Tom'
},
'1' => {
'No.Stu' => '55',
'Dean' => 'John'
},
'3' => {
'No.Stu' => '56',
'No.Teacher' => '20',
'Dean' => 'James'
},
'2' => {
'No.Stu' => '60',
'Dean' => 'Tony'
}
};
which looks good.
But I have no idea about how to process the column for teacher as it has some different structures:
col1 col2 col3 col4 col5
row20 teacher name age gender race
rwo21 Jane 20 female white
I tried to loop over the columns to fetch the cell values as keys but not succeed:
my ( $col_min, $col_max ) = $worksheet->col_range();
my $col = $col_min;
my $cell3 = $worksheet->get_cell(++$row, $col );
elsif ($state eq 'teacher') {
while ($col <= $col_max) {
my $key = $worksheet->get_cell($row, $col++ );
}
$data{$school}{$teacher}{$key} =$cell3->value();
}
Any suggestions?
Update: I follow what @imran suggested, and it worked pretty well and then I just rewrite what I have parsed into a new XLS file using the following code:
use Spreadsheet::WriteExcel;
$workbook = Spreadsheet::WriteExcel->new('Result.xls');
$worksheet = $workbook->add_worksheet();
$col = 0;
$row = 0;
$worksheet->write( $row++, $col,
["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );
foreach my $school( sort keys %data ) {
$worksheet->write( $row++, $col,
[ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
}
$workbook->close();
It all worked well except one minor thing. The cell for teacher_age
is empty, but it shows in the hash table that it has already been parsed. I can not figure that out...
The hashed teacher part is as follows:
'teacher' => {
'race' => 'white',
'name' => 'Jane',
'age ' => '25',
'gender' => ' female'
},
And the code I used is exactly what you suggested, but the cell for 'teacher_age' in re-formatted xls. file is empty.
Update 2: Back to the question about 'state', does it always start parsing from the subsequent row? What if I have the following data structure:
School 1
course math
eng
...
School 2
course phy
...
School 3
course chem
gym
music
If I use the old code and define a course state, I can only parse eng
from school 1
, gym and music
course from school 2
. Well, I was thinking to define the course state one row ahead but did not get any luck. Any suggestions?
Update:
I followed @imran's suggestion by changing the course code to:
} elsif ($key eq 'course') {
$state = 'course';
$course = $worksheet->get_cell( $row, 1 )->value();
}
And correspondingly the hash table as:
} elsif ($state eq 'course') {
my $key = $cell1->value();
$data{$school}{$course}{$key} =$cell1->value();
}
However, it can only parse
'1' => {
'math' => {
'eng' => 'eng'
},
}
'3' => {
'chem' => {
'gym' => 'gym',
'music' => 'music
},
}
And no parsing for school2
.
The problem has been solved by using simply proceeding course
right after it reaches the key. i.e. instead of having separate blocks of defining course keys
and proceeding the course table. I now have:
} elsif ($key eq 'course') {
$state = 'course';
$course = $state;
my $key = $cell1->value();
$data{$school}{$course}{$key} = $key;
}
And it does that trick. But I am still curious about how it works.
Upvotes: 0
Reputation: 1560
You are doing a $row++
when you shouldn't be.
Try this code instead as your starting point:
my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
my $cell0 = $worksheet->get_cell( $row, 0 );
my $cell1 = $worksheet->get_cell( $row, 1 );
if (defined($cell0)) {
my $key = $cell0->value();
if ($key eq 'School') {
$state = 'school';
$school = $cell1->value();
} elsif ($key eq 'course') {
$state = 'course';
} elsif ($key eq 'teacher') {
$state = 'teacher';
} elsif ($key eq 'student') {
$state = 'student');
$student = $worksheet->get_cell( $row, 2 )->value();
} else {
$data{$school}{$key} = $cell1->value();
}
} elsif ($state eq 'course') {
# process columns for course
} elsif ($state eq 'teacher') {
# process columns for teacher
} elsif ($state eq 'student') {
# process columns for student
}
$row++;
}
use Data::Dumper;
print Dumper(\%data);
Update:
To process the teacher rows, I would first declare a %teacher_columns hash outside the first while loop and also the $col_min and $col_max using the col_range() method. Then inside the $key eq 'teacher'
elsif clause, I would do something like this:
%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
my $cell = $worksheet->get_cell( $row, $col );
$teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}
Then in the state eq 'teacher'
elsif clause, I would do something like this:
foreach my $key (keys %teacher_columns) {
my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
if (defined($cell)) {
# store cell data into proper location of your data hash
}
}
Upvotes: 2