Reputation: 59
I need to extract the message attribute from the following string (i.e. I want to extract The String "test" appears 4 times in the file.).
severity="warning" message="The String "test" appears 4 times in the file." source="com.puppycrawl.tools.checkstyle.checks.coding.MultipleStringLiteralsCheck"
I've tried using the regular expression message="([^"]*)"
but this stops at the first " that appears. The String
is getting returned in this case.
Is there a way to ignore the inner quotes within the message attribute and capture the entire attribute?
Upvotes: 1
Views: 143
Reputation: 126722
This solution keeps fetching characters from the string until a new label like source=
is encountered. All parameter values are stored in hash %params
, so the value for message
is just $params{message}
I've used Data::Dump
only to display the complete hash contents once the string has been parsed
use strict;
use warnings 'all';
use feature 'say';
my $str = 'severity="warning" message="The String "test" appears 4 times in the file." source="com.puppycrawl.tools.checkstyle.checks.coding.MultipleStringLiteralsCheck"';
my %params;
while ( $str =~ / (\w+) \s* = \s* " ( (?: . (?! \w+ \s* = ) )* ) " /gsx ) {
$params{$1} = $2;
}
say $params{message};
use Data::Dump;
dd \%params;
The String "test" appears 4 times in the file.
{
message => "The String \"test\" appears 4 times in the file.",
severity => "warning",
source => "com.puppycrawl.tools.checkstyle.checks.coding.MultipleStringLiteralsCheck",
}
Upvotes: 1
Reputation: 626747
If we can assume that the key always consists of alphanumerics or underscore symbols (\w+
) and is followed with =
and the vlaues do not contain that pattern, you can use a lazy quantifier with a dot .*?
and check the trailing boundary with a positive lookehead. Thus, as a quick-and-dirty once-time fix, you can use
message="(.*?)"(?=\s+\w+=|$)
See the regex demo
Note that .
does not match newline symbols by default, you will need a /s
modifier.
The input you have needs fixing by all means.
Upvotes: 1
Reputation: 74018
If the attributes are always in this order, i.e. source
follows message
, you might try to make it a bit more robust
message="(.*?)"\s+source="
This will break of course, if source=
occurs in the message.
Upvotes: 1