srchulo
srchulo

Reputation: 5203

Issues parsing a CSV file in perl using Text::CSV

I'm trying to use Text::CSV to parse this CSV file. Here is how I am doing it:

open my $fh, '<', 'test.csv' or die "can't open csv";
my $csv = Text::CSV_XS->new ({ sep_char => "\t", binary => 1 , eol=> "\n"});
$csv->column_names($csv->getline($fh));

while(my $row = $csv->getline_hr($fh)) {
    # use row
}

Because the file has 169,252 rows (not counting the headers line), I expect the loop to run that many times. However, it only runs 8 times and gives me 8 rows. I'm not sure what's happening, because the CSV just seems like a normal CSV file with \n as the line separator and \t as the field separator. If I loop through the file like this:

while(my $line = <$fh>) {
    my $fields = $csv->parse($line);
}

Then the loop goes through all rows.

Upvotes: 2

Views: 5715

Answers (1)

Drav Sloan
Drav Sloan

Reputation: 1562

Text::CSV_XS is silently failing with an error. If you put the following after your while loop:

 my ($cde, $str, $pos) = $csv->error_diag ();
 print "$cde, $str, $pos\n";

You can see if there were errors parsing the file and you get the output:

2034, EIF - Loose unescaped quote, 336

Which means the column:

GT New Coupe 5.0L CD Wheels: 18" x 8" Magnetic Painted/Machined 6 Speakers

has an unquoted escape string (there is no backslash before the ").

The Text::CSV perldoc states:

allow_loose_quotes

By default, parsing fields that have quote_char characters inside an unquoted field, like

1,foo "bar" baz,42

would result in a parse error. Though it is still bad practice to allow this format, we cannot help there are some vendors that make their applications spit out lines styled like this.

If you change your arguments to the creation of Text::CSV_XS to:

my $csv = Text::CSV_XS->new ({ sep_char => "\t", binary => 1,
    eol=> "\n", allow_loose_quotes => 1 });

The problem goes away, well until row 105265, when Error 2023 rears its head:

2023, EIQ - QUO character not allowed, 406

Details of this error in the perldoc:

2023 "EIQ - QUO character not allowed"

Sequences like "foo "bar" baz",qu and 2023,",2008-04-05,"Foo, Bar",\n will cause this error.

Setting your quote character empty (setting quote_char => '' on your call to Text::CSV_XS->new()) does seem to work around this and allow processing of the whole file. However I would take time to check if this is a sane option with the CSV data.

TL;DR The long and short is that your CSV is not in the greatest format, and you will have to work around it.

Upvotes: 7

Related Questions