jonah_w
jonah_w

Reputation: 1032

In perl match a dot when there're at least three words before it

I'm using (?<=(?:(?:\w|,|'){1,20} ){2}(?:\w|,|'){1,20} ?)\. But it's not working as expected:

use v5.35.2;
use warnings;
use strict;

my $str = shift // q{If you have to go. you go. That's no problem.}; 

my $regex = qr/(?<=(?:(?:\w|,|'){1,20} ){2}(?:\w|,|'){1,20} ?)\./;

my @all_parts = split $regex, $str;

say for@all_parts;

It should print out If you have to go and you go. That's no problem

Is there an easier way to achieve this?

Upvotes: 1

Views: 143

Answers (2)

ikegami
ikegami

Reputation: 385907

split / [\w'] (?: [\s,]+ [\w']+ ){2} \K \. /x

Notes:

  • It's usually easier and more efficient to use \K instead of a lookbehind. It also has the advantage that can look further back than the 255 chars a real variable-length lookbehind can look back. But it has the disadvantage that it can't "look behind" further than the end of the previous match. This isn't a problem here.
  • Feel free to remove the whitespace. If you do, you can also remove the x.
  • Adding a + after each existing + should make it a tiny bit faster.
  • You obviously consider a's to be one word, but the earlier answer can count it as two. For example, it considers the . to be preceded by three words in a's b. c.

Upvotes: 0

Shawn
Shawn

Reputation: 52439

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;

my $str = shift // q{If you have to go. you go. That's no problem.}; 
my $regex = qr/(?:\b[\w,']+\s*){3}\K\./; 
my @all_parts = split $regex, $str;
say for @all_parts;

splits like you want. Using \K to discard everything before the period from the actual match is the key bit. (There's probably tweaks that could be made to the RE to better account for edge cases you didn't provide in your example string).

Upvotes: 2

Related Questions