Reputation: 7411
I am using Text::CSV to parse a csv file. Not all lines can be parsed, because of some bad characters.
The Text::CSV documentation says:
Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde).How can I filter out as easy as possible any not-allowed characters?
Upvotes: 2
Views: 1325
Reputation: 62089
Instead of filtering out the "bad" characters, you probably want to use the binary
flag to tell Text::CSV to stop enforcing its ASCII-only rule:
my $csv = Text::CSV->new ({ binary => 1 });
If you're trying to read a file that's in a non-ASCII character set (e.g. Latin-1 or UTF-8), you should look at the Text::CSV::Encoded module.
Upvotes: 9
Reputation: 336128
$subject =~ s/[^\x09\x20-\x7E]+//g;
will remove all those characters.
But this seems like a strange limitation on what's allowed in a CSV file. I haven't seen a csv parser yet that couldn't handle, for example, umlauts and other non-ASCII characters. I don't know Perl, though.
Upvotes: 0