Reputation: 13062
I have a data set that is tab delimited with the user-agent strings in double quotes. I need to parse each of these columns and based on the answer of my other post I used the Text::CSV module.
94410634 0 GET "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)" 1
The code is a simple one.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new(sep_char => "\t");
while (<>) {
if ($csv->parse($_)) {
my @columns = $csv->fields();
print "@columns\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
But i get the Failed to parse line:
error when I try it on this dataset. what am I doing wrong? I need to extract the 4th column containing the user-agent strings for further processing.
Upvotes: 2
Views: 1694
Reputation: 129549
Your constructor arguments should be in a hashref, not a hash:
my $csv = Text::CSV->new( { sep_char => "\t" } );
Are you sure the dataset is exactly what you think it is? May be there's a double quote missing somewhere or there were no tabs?
To verify the file contents, are you on Unix/Linux or Windows? On unix, please run this: cat -vet my_log_file_name | head -3
and check whether the output has spaces or "^I" sequences where you expect tabs. cat -vet
prints out all the special characters as special printable sequences (TAB
=> ^I
, newline => $
, etc...)
The following test works perfectly on my ActivePerl:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $s = qq[94410634\t0\tGET\t"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; AskTB5.5)"\t1\n];;
my $csv = Text::CSV->new({sep_char => "\t"});
if ($csv->parse($s)) {
my @columns = $csv->fields();
print "c=$columns[3]\n";
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
Output:
C:\> perl d:\scripts\test4.pl
c=Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.6; ...
Upvotes: 6