Reputation: 5069
Need to parse some logs with space (" "
) as separator and observe double or single quote.
For example
id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1 ...
should be parsed as
id=firewall
time="2010-05-09 16:07:21 UTC"
1.1.1.1
The logs are
Tried to use Text::CSV_XS because it's much faster than the pure perl based parsers. However, the following code doesn't do what I expected because the logs are not valid csv string.
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
Is there a fast parser that can parse logs mentioned above? Would be nice to configure Text::CSV_XS to do the desired parsing.
Thanks to @ThisSuitIsBlackNot who suggested rewriting this question.
Upvotes: 1
Views: 297
Reputation: 126722
I answered this in my response to your comment on my solution to your previous question.
Here is the answer I gave before, together with the new data that you have shown in this question.
The problem I had with your previous question is that you showed nothing but key=value
pairs, so I assumed that that was all you had in your data.
I hope this works for you.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1 ...';
my @fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for @fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"
1.1.1.1
...
Upvotes: 1
Reputation: 3484
I'm half certain that you're going to tell me more about the log format after I submit this answer but here goes.
Only you know what your logs look like. If their format is regular, you'll have an easier time parsing them.
But given what you've supplied, you can split on spaces into an array, and then regroup the timestamp:
my $a = q(id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1);
my @f = split(/ /, $a);
my $id = $f[0];
my $time = join(' ', @f[1..3]));
print "$id\n$time\n$f[4]\n";
Upvotes: 0