packetie
packetie

Reputation: 5059

Bug with parsing by Text::CSV_XS?

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".

The funny thing is, if I remove the double quote in the string $a, then it will do splitting.

Wonder if it's a bug or I missed something. Thanks!

use Text::CSV_XS;

$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';

$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
    print $e, "\n";
}

EDIT:

In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?

$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';

Upvotes: 1

Views: 2556

Answers (2)

Borodin
Borodin

Reputation: 126722

You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.

Disable both quote_char and escape_char, like this

use strict;
use warnings;

use Text::CSV_XS;

my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';

my $space_sep = Text::CSV_XS->new({
   sep_char    => ' ',
   quote_char  => undef,
   escape_char => undef,
});

$space_sep->parse($string);

for my $field ($space_sep->fields) {
    print "$field\n";
}

output

id=firewall
time="2010-05-09
16:07:21
UTC"

But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.

In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.


Update

As @ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.

use strict;
use warnings;

my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';

my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;

use Data::Dump;
dd \%data;

output

{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }

Update

This program will extract the two name=value strings and print them on separate lines.

use strict;
use warnings;

my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';

my @fields = $string =~ / (?: "[^"]*" | \S )+ /xg;

print "$_\n" for @fields;

output

id=firewall
time="2010-05-09 16:07:21 UTC"

Upvotes: 3

TLP
TLP

Reputation: 67900

If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my @fields = quotewords(' ', 0, $str);
print Dumper \@fields;
my %hash = map split(/=/, $_, 2), @fields;
print Dumper \%hash;

Output:

$VAR1 = [
          'id=firewall',
          'time=2010-05-09 16:07:21 UTC'
        ];
$VAR1 = {
          'time' => '2010-05-09 16:07:21 UTC',
          'id' => 'firewall'
        };

I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

Upvotes: 3

Related Questions