weismat
weismat

Reputation: 7411

How do I read a CSV file containing non-ASCII characters in Perl?

I am using Text::CSV to parse a csv file. Not all lines can be parsed, because of some bad characters.
The Text::CSV documentation says:

Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde).
How can I filter out as easy as possible any not-allowed characters?

Upvotes: 2

Views: 1325

Answers (2)

cjm
cjm

Reputation: 62089

Instead of filtering out the "bad" characters, you probably want to use the binary flag to tell Text::CSV to stop enforcing its ASCII-only rule:

my $csv = Text::CSV->new ({ binary => 1 });

If you're trying to read a file that's in a non-ASCII character set (e.g. Latin-1 or UTF-8), you should look at the Text::CSV::Encoded module.

Upvotes: 9

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

$subject =~ s/[^\x09\x20-\x7E]+//g;

will remove all those characters.

But this seems like a strange limitation on what's allowed in a CSV file. I haven't seen a csv parser yet that couldn't handle, for example, umlauts and other non-ASCII characters. I don't know Perl, though.

Upvotes: 0

Related Questions