Alexander Shalin
Alexander Shalin

Reputation: 7

Problems with parsing CSV file in Perl

I have a CSV file like this:

id,item,itemtype,date,service,level,message,action,user
"344","-1","IRM","2008-08-22 13:01:57","login","1","Failed login: \'irm\', database \'irmD\'",NULL,NULL
"346","-1","IRM","2008-08-27 10:58:59","login","1","Ошибка входа:\'\', база данных \'irmD\'",NULL,NULL

It's Okay with the second line, but Text::CSV just skips the third one. The third line consists Cyrillic characters, but the file is encoded in UTF-8 and Perl shouldn't have any problems with that.

And the code:

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
use utf8;

my $file = 'Test.csv'; my $csv = Text::CSV->new();
open (CSV, "<", $file) or die $!;
while (<CSV>) {
    if ($csv->parse($_)) {
        if ($. == 1) {
            next;
        }
        my @columns = $csv->fields();
        my $id=$columns[0];
        print $id." ";
    }
}
print "\n";
close CSV;

Any help or hint will be appreciated.

Upvotes: 0

Views: 168

Answers (2)

choroba
choroba

Reputation: 242363

Did you read the documentation of Text::CSV?

If your data contains newlines embedded in fields, or characters above 0x7e (tilde), or binary data, you must set "binary => 1"

Also, use utf8 tells Perl you're going to use UTF-8 in the source code, not in the data. Remove it.

Using <> to read in CSV is also mentioned in the documentation:

while (<>) {           #  WRONG!

Here is a working version:

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;

my $file = 'Test.csv';
my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;
open my $CSV, '<', $file or die $!;
while (my $line = $csv->getline($CSV)) {
    next if 1 == $.;

    my @columns = @$line;
    my $id = $columns[0];
    print $id . " ";
}
print "\n";
close $CSV;

Upvotes: 3

Sobrique
Sobrique

Reputation: 53508

I think your problem will be, that whilst you've useed UTF8, that's only really for perl's uses. From: http://perldoc.perl.org/utf8.html

utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code

Looking at Text::CSV

You probably want:

$csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });

You will also - probably - need to specify that you're opening a UTF-8 file. You can either do this as part of the open or with binmode

open ( my $filehandle, "<:encoding(UTF-8)", "Test.csv" );

Upvotes: 0

Related Questions