Reputation: 6842
I have a string where different predefined keywords introduce different data. Is there a way to do that using clever use of regexp, or something? Here is an example:
Keywords can be "first name: "
and "last name: "
. Now I want to parse:
"character first name: Han last name: Solo"
into
{ "first name: " => "Han ", "last name: " => "Solo" }
Of course, the order of the keywords in the input string is not fixed. This should also work on :
"character last name: Solo first name: Han"
I understand there are issues to be raised with spaces and so on. I'll ignore them here.
I know how to solve this problem looping on the different keywords, but I don't find that very pretty.
Split almost fits the bill. Its only problem is that it returns an array and not a hash, so I don't know which is the first name or the last name.
My example is somewhat misleading. Here is another one:
my @keywords = ("marker 1", "marker 2", "marker 3");
my $rawString = "beginning marker 1 one un marker 2 two deux marker 3 three trois and the rest";
my %result;
# <grind result>
print Dumper(\%result);
will print:
$VAR1 = {
'marker 2' => ' two deux ',
'marker 3' => ' three trois and the rest',
'marker 1' => ' one un '
};
Upvotes: 2
Views: 795
Reputation: 40142
Here is a solution using split (with separator retention mode) that is extensible with other keys:
use warnings;
use strict;
my $str = "character first name: Han last name: Solo";
my @keys = ('first name:', 'last name:');
my $regex = join '|' => @keys;
my ($prefix, %hash) = split /($regex)\s*/ => $str;
print "$_ $hash{$_}\n" for keys %hash;
which prints:
last name: Solo
first name: Han
To handle keys that contain regex metacharacters, replace the my $regex = ...
line with:
my $regex = join '|' => map {quotemeta} @keys;
Upvotes: 7
Reputation: 118128
The following loops over the string once to find matches (after normalizing the string). The only way you can avoid the loop is if each keyword can only appear once in the text. If that were the case, you could write
my %matches = $string =~ /($re):\s+(\S+)/g;
and be done with it.
The script below deals with possible multiple occurrences.
#!/usr/bin/perl
use strict; use warnings;
use File::Slurp;
use Regex::PreSuf;
my $re = presuf( 'first name', 'last name' );
my $string = read_file \*DATA;
$string =~ s/\n+/ /g;
my %matches;
while ( $string =~ /($re):\s+(\S+)/g ) {
push @{ $matches{ $1 } }, $2;
}
use Data::Dumper;
print Dumper \%matches;
__DATA__
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore character first name: Han last
name: Solo et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud character last name: Solo first name: Han exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum
character last name: Solo first name: Han dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum
Upvotes: 3
Reputation: 37146
use strict;
use warnings;
use Data::Dump 'dump'; # dump allows you to see what %character 'looks' like
my %character;
my $nameTag = qr{(?:first|last) name:\s*};
# Use an array slice to populate the hash in one go
@character{ ($1, $3) } = ($2, $4) if $string =~ /($nameTag)(.+)($nameTag)(.+)/;
dump %character; # returns ("last name: ", "Solo", "first name: ", "Han ")
Upvotes: 2
Reputation: 39158
This works.
use 5.010;
use Regexp::Grammars;
my $parser = qr{
(?:
<[Name]>{2}
)
<rule: Name>
((?:fir|la)st name: \w+)
}x;
while (<DATA>) {
/$parser/;
use Data::Dumper; say Dumper $/{Name};
}
__DATA__
character first name: Han last name: Solo
character last name: Solo first name: Han
Output:
$VAR1 = [
' first name: Han',
' last name: Solo'
];
$VAR1 = [
' last name: Solo',
' first name: Han'
];
Upvotes: 2
Reputation: 86774
This is possible IF:
1) You can identify a small set of regexes that can pick out the tags 2) The regex for extracting the value can be written so that it picks out only the value and ignores following extraneous data, if any, between the end of the value and the start of the next tag.
Here's a sample of how to do it with a very simple input string. This is a debug session:
DB<14> $a = "a 13 b 55 c 45";
DB<15> %$b = $a =~ /([abc])\s+(\d+)/g;
DB<16> x $b
0 HASH(0x1080b5f0)
'a' => 13
'b' => 55
'c' => 45
Upvotes: 0
Reputation: 3364
Use Text::ParseWords. It probably doesn't do all of what you want, but you're much better building on it than trying to solve the whole problem from scratch.
Upvotes: -1