Reputation: 7
I have a CSV file like this:
id,item,itemtype,date,service,level,message,action,user
"344","-1","IRM","2008-08-22 13:01:57","login","1","Failed login: \'irm\', database \'irmD\'",NULL,NULL
"346","-1","IRM","2008-08-27 10:58:59","login","1","Ошибка входа:\'\', база данных \'irmD\'",NULL,NULL
It's Okay with the second line, but Text::CSV just skips the third one. The third line consists Cyrillic characters, but the file is encoded in UTF-8 and Perl shouldn't have any problems with that.
And the code:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use utf8;
my $file = 'Test.csv'; my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
if ($csv->parse($_)) {
if ($. == 1) {
next;
}
my @columns = $csv->fields();
my $id=$columns[0];
print $id." ";
}
}
print "\n";
close CSV;
Any help or hint will be appreciated.
Upvotes: 0
Views: 168
Reputation: 242363
Did you read the documentation of Text::CSV?
If your data contains newlines embedded in fields, or characters above 0x7e (tilde), or binary data, you must set "binary => 1"
Also, use utf8
tells Perl you're going to use UTF-8 in the source code, not in the data. Remove it.
Using <>
to read in CSV is also mentioned in the documentation:
while (<>) { # WRONG!
Here is a working version:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $file = 'Test.csv';
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
open my $CSV, '<', $file or die $!;
while (my $line = $csv->getline($CSV)) {
next if 1 == $.;
my @columns = @$line;
my $id = $columns[0];
print $id . " ";
}
print "\n";
close $CSV;
Upvotes: 3
Reputation: 53508
I think your problem will be, that whilst you've use
ed UTF8, that's only really for perl's uses.
From:
http://perldoc.perl.org/utf8.html
utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code
Looking at Text::CSV
You probably want:
$csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
You will also - probably - need to specify that you're opening a UTF-8 file. You can either do this as part of the open
or with binmode
open ( my $filehandle, "<:encoding(UTF-8)", "Test.csv" );
Upvotes: 0