Reputation: 1032
I'm using (?<=(?:(?:\w|,|'){1,20} ){2}(?:\w|,|'){1,20} ?)\.
But it's not working as expected:
use v5.35.2;
use warnings;
use strict;
my $str = shift // q{If you have to go. you go. That's no problem.};
my $regex = qr/(?<=(?:(?:\w|,|'){1,20} ){2}(?:\w|,|'){1,20} ?)\./;
my @all_parts = split $regex, $str;
say for@all_parts;
It should print out If you have to go
and you go. That's no problem
Is there an easier way to achieve this?
Upvotes: 1
Views: 143
Reputation: 385907
split / [\w'] (?: [\s,]+ [\w']+ ){2} \K \. /x
Notes:
\K
instead of a lookbehind. It also has the advantage that can look further back than the 255 chars a real variable-length lookbehind can look back. But it has the disadvantage that it can't "look behind" further than the end of the previous match. This isn't a problem here.x
.+
after each existing +
should make it a tiny bit faster.a's
to be one word, but the earlier answer can count it as two. For example, it considers the .
to be preceded by three words in a's b. c
.Upvotes: 0
Reputation: 52439
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
my $str = shift // q{If you have to go. you go. That's no problem.};
my $regex = qr/(?:\b[\w,']+\s*){3}\K\./;
my @all_parts = split $regex, $str;
say for @all_parts;
splits like you want. Using \K
to discard everything before the period from the actual match is the key bit. (There's probably tweaks that could be made to the RE to better account for edge cases you didn't provide in your example string).
Upvotes: 2