AgA
AgA

Reputation: 2126

Unable to parse csv file using Text::CSV

I've the csv file in this format:

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

The first line is the column titles.

I've tried most of options in Text::CSV but I'm not able to extract the fields.

Here sep_char=>' '

The nearest I could go is to get the first word of the first column("kasperaky" only).

I'm creating the object this way(while trying various settings):

my $csv = Text::CSV->new ( { 
    binary => 1 ,
    sep_char=>' ',allow_loose_quotes=>0,quote_space=>0,quote_char          => '"',
    ,allow_whitespace    =>0, eol=>"\015\012"
     } ) 
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

Upvotes: 2

Views: 2830

Answers (4)

avrono
avrono

Reputation: 1680

This worked for me with a file space seperated with 1 or more spaces This is a case where Text::CSV does not do the job ...

open(my $data, '<:encoding(UTF-8)', $filename) or die "Cannot open $filename";

while( my $line = <$data> ) {
        my @fields = split(' ', $line);
        print "\n$line : $fields[0] --- $fields[1] ----- $fields[2]";

}

Upvotes: 0

Joel Berger
Joel Berger

Reputation: 20280

I always recommend using a parser, and usually Text::CSV is great, but when you are not working with real CSV sometimes it can be a pain. You might try using the core module Text::ParseWords in this case.

Here is my example.

#!/usr/bin/env perl

use strict;
use warnings;

use Text::ParseWords qw/parse_line/;

my @data;
while( my $line = <DATA> ) {
  chomp $line;
  my @words = parse_line( qr/\s+/, 0, $line );
  next unless @words;
  push @data, \@words;
}

use Data::Dumper;
print Dumper \@data;

__DATA__

"Keyword"   "Competition"   "Global Monthly Searches"   "Local Monthly Searches (United States)"    "Approximate CPC (Search) - INR"

"kasperaky support" -0  -0  -0  -0

This implementation builds up a 2D array of your data, skipping unused lines. Of course you can build whatever data structure you want once you have parsed the tokens.

$VAR1 = [
          [
            'Keyword',
            'Competition',
            'Global Monthly Searches',
            'Local Monthly Searches (United States)',
            'Approximate CPC (Search) - INR'
          ],
          [
            'kasperaky support',
            '-0',
            '-0',
            '-0',
            '-0'
          ]
        ];

Upvotes: 1

Zaid
Zaid

Reputation: 37156

Your CSV is tab-separated. Use the following (code is tested to work against your example file):

use strictures;
use autodie qw(:all);       # automatic error checking open/close
use charnames qw(:full);    # \N named characters
use Text::CSV qw();
my $csv = Text::CSV->new({
    auto_diag   => 2,       # automatic error checking CSV methods
    binary      => 1,
    eol         => "\N{CR}\N{LF}",
    sep_char    => "\N{TAB}",
}) or die 'Cannot use CSV: ' . Text::CSV->error_diag;

open my $fh, '<:encoding(ASCII)', 'computer crash.csv';
while (my $row = $csv->getline($fh)) {
    ...
}
close $fh;

Upvotes: 5

ikegami
ikegami

Reputation: 386686

To call that a CSV file is a bit of stretch! Your separator isn't a space, it's a sequence of 1 or more spaces, and Text::CSV doesn't handle that. (allow_whitespace doesn't work when your separator is a space, unfortunately.) You could use something like:

use List::MoreUtils qw( apply );
my @fields = apply { s/\\(.)/$1/sg } $line =~ /"((?:[^"\\]|\\.)*)"/sg;

Now, if those are tabs, that's a different story, and you could use sep_char => "\t".

Upvotes: 4

Related Questions