packetie
packetie

Reputation: 5069

Fast parser for logs

Need to parse some logs with space (" ") as separator and observe double or single quote.

For example

id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1 ...

should be parsed as

id=firewall
time="2010-05-09 16:07:21 UTC"
1.1.1.1

The logs are

Tried to use Text::CSV_XS because it's much faster than the pure perl based parsers. However, the following code doesn't do what I expected because the logs are not valid csv string.

use Text::CSV_XS;

$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';

$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
    print $e, "\n";
}

Is there a fast parser that can parse logs mentioned above? Would be nice to configure Text::CSV_XS to do the desired parsing.

Thanks to @ThisSuitIsBlackNot who suggested rewriting this question.

Upvotes: 1

Views: 297

Answers (2)

Borodin
Borodin

Reputation: 126722

I answered this in my response to your comment on my solution to your previous question.

Here is the answer I gave before, together with the new data that you have shown in this question.

The problem I had with your previous question is that you showed nothing but key=value pairs, so I assumed that that was all you had in your data.

I hope this works for you.

use strict;
use warnings;

my $string = 'id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1 ...';

my @fields = $string =~ / (?: "[^"]*" | \S )+ /xg;

print "$_\n" for @fields;

output

id=firewall
time="2010-05-09 16:07:21 UTC"
1.1.1.1
...

Upvotes: 1

Len Jaffe
Len Jaffe

Reputation: 3484

I'm half certain that you're going to tell me more about the log format after I submit this answer but here goes.

Only you know what your logs look like. If their format is regular, you'll have an easier time parsing them.

But given what you've supplied, you can split on spaces into an array, and then regroup the timestamp:

 my $a = q(id=firewall time="2010-05-09 16:07:21 UTC" 1.1.1.1);
 my @f = split(/ /, $a);
 my $id = $f[0];
 my $time = join(' ', @f[1..3]));

 print "$id\n$time\n$f[4]\n";

Upvotes: 0

Related Questions