Hesei
Hesei

Reputation: 35

Skip bad CSV lines in Perl with Text::CSV

I have a script that is essentially still in testing. I would like to use Text CSV to breakdown large quantities of CSV files dumped hourly.

These files can be quite large and of inconsistent quality. Sometimes I'll get strange characters or data, but the usual issue is lines that just stop.

"Something", "3", "hello wor

The closed quote is my biggest hurdle. The script just breaks. The error goes to stderr and my while loop is broken.

While (my $row = $csv->getline($data))

The error I get is...

# CSV_PP ERROR: 2025 - EIQ - Loose unescaped escape

I can't seem to do any kind of error handling for this. If I enable allow_loose_escapes, all I get instead is a lot of errors, because it considers the subsequent new lines as part of the same row.

Upvotes: 3

Views: 1485

Answers (1)

TLP
TLP

Reputation: 67910

Allowing the loose escape is not the answer. It just makes your program ignore the error and try to incorporate the broken line with your other lines, as you also mentioned. Instead you can try to catch the problem, and check your $row for definedness:

use strict;
use warnings;
use Text::CSV;
use feature 'say';

my $csv = Text::CSV->new({
        binary  => 1,
        eol => $/,
    });

while (1) {
    my $row = $csv->getline(*DATA);
    $csv->eof and last; 
    if (defined $row) {
        $csv->print(*STDOUT, $row);
    } else {
        say "==" x 10;
        print "Bad line, skipping\n";
        say $csv->error_diag();
        say "==" x 10;
    }
}


__DATA__
1,2,3,4
a,b,c,d
"Something", "3", "hello wor
11,22,33,44

For me this outputs:

1,2,3,4
a,b,c,d
====================
Bad line, skipping
2034EIF - Loose unescaped quote143
====================
11,22,33,44

If you want to save the broken lines, you can access them with $csv->error_input(), e.g.:

print $badlines $csv->error_input();

Upvotes: 5

Related Questions