Jamie
Jamie

Reputation: 83

Code to convert .XLSX spreadsheet to CSV is not working as expected

I have an Excel spreadsheet with a number of columns and rows. I want to turn this into a CSV, but only include certain columns in the CSV.

Here is my code:

#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseXLSX;
use Excel::Writer::XLSX;

my $excel = Spreadsheet::ParseXLSX -> new();
my $workbook = $excel->parse('headers.xlsx');
my $worksheet = $workbook->worksheet(0);
my $destination = 'csv.txt';
my $csv;
my ($row_min, $row_max) = $worksheet->row_range();
my ($col_min, $col_max) = $worksheet->col_range();

for my $row ( $row_min .. $row_max ) {
    for my $col ( $col_min .. $col_max ) {

        # my $cell = $worksheet>{Cells}[$row][$col];
        my $cell = $worksheet->get_cell( $row, $col );
        if ($cell) {
            if ($col eq ( 'A' || 'S' || 'T' || 'AA' || 'AX' || 'BC' || 'D' || 'AN' || 'AV' )) {
                if ($col == $col_max) {
                    $csv .= $cell->Value . "\n";
                    } else {
                    $csv .= $cell->Value . ",";
                }
            }
        }
    }
}

open (my $fh, '>', $destination) or die '$! error trying to write';
print $fh $csv;
close ($fh);

I don't get any errors or warnings when I run it, which doesn't help to identify the problem.

Can anyone identify any errors?

Upvotes: 0

Views: 281

Answers (2)

Borodin
Borodin

Reputation: 126722

Your test

$col eq ( 'A' || 'S' || 'T' || 'AA' || 'AX' || 'BC' || 'D' || 'AN' || 'AV' )

is finding the first of those text strings that is true, which is A, and comparing that with $col, which isn't what you want at all. You would have to write

$col eq 'A' || $col eq 'S' || $col eq 'T' || $col eq 'AA' || $col eq 'AX' || $col eq 'BC' || $col eq 'D' || $col eq 'AN' || $col eq 'AV'

except that the module doesn't deal in column labels like that so you need to transform them to numbers

I've written this, which should work. It first builds a hash which relates each column label to its number for the first 100 columns. Then I've used that hash to convert your list of labels to a list of column numbers, and iterated over that. (It produces [1, 4, 19, 20, 27, 40, 48, 50, 55] if you're interested.) It's much neater than iterating over all the columns and ignoring the ones you don't want

It's not tested as I don't have a data file to test with, but it does compile

I'm not at all sure about your code that skips a cell if it contains a false value. That is going to skip empty cells and cells containing the number zero, and it will leave the columns in your CSV file misaligned as a result. The part of my code that does that is @csv_row = grep $_, @csv_row and you may want to remove that if you agree with me

#!/usr/bin/perl

use strict;
use warnings;

use Spreadsheet::ParseXLSX;

my ($source, $dest) = qw/ headers.xlsx headers.csv /;

my $excel = Spreadsheet::ParseXLSX->new;
my $workbook = $excel->parse($source);
my $worksheet = $workbook->worksheet(0);
my ($row_min, $row_max) = $worksheet->row_range;

my %col_numbers;
for ( my ($n, $name) = (1, 'A'); $n <= 100; ++$n, ++$name ) {
    $col_numbers{$name} = $n;
}

my @cols = sort { $a <=> $b }
        map $col_numbers{$_}, qw/ A  S  T  AA  AX  BC  D  AN  AV /;

open my $fh, '>', $dest
        or die qq{Unable to open "$dest" for output: $!};

for my $row ( $row_min .. $row_max ) {
    my @csv_row = map $worksheet->get_cell($row, $_)->unformatted, @cols;
    print $fh join(',', @csv_row), "\n";
}

close $fh;

Upvotes: 1

ysth
ysth

Reputation: 98398

First, $col is numeric, so you shouldn't be testing it against string column names.

Second, you need e.g.

$col==7 || $col==9 ||...

As you have it, it is only testing $col eq 'A' (since that is the result of all the ||'s).

Upvotes: 2

Related Questions