dlw
dlw

Reputation: 27

How can a Perl regex re-use part of the previous match for the next match?

I need some Perl regular expression help. The following snippet of code:

use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my @results = ();
1 while $str =~ s/(.{2}\b$word\b.{2})/push(@results,"$1\n")/e;
print @results;

Produces the following output:

A plus B
D plus E
2 plus F
H plus I
4 plus J
5 plus K

What I want to see is this, where a character already matched can appear in a new match in a different context:

A plus B
D plus E
E plus F
H plus I
I plus J
J plus K

How do I change the regular expression to get this result? Thanks --- Dan

Upvotes: 2

Views: 2657

Answers (6)

Alan Moore
Alan Moore

Reputation: 75222

Another option is to use a lookahead:

use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E "
        . "plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my $chars = 2;
my @results = ();

push @results, $1 
  while $str =~ /(?=((.{0,$chars}?\b$word\b).{0,$chars}))\2/g;

print "'$_'\n" for @results;

Within the lookahead, capturing group 1 matches the word along with a variable number of leading and trailing context characters, up to whatever maximum you've set. When the lookahead finishes, the backreference \2 matches "for real" whatever was captured by group 2, which is the same as group 1 except that it stops at the end of the word. That sets pos where you want it, without requiring you to calculate how many characters you actually matched after the word.

Upvotes: 2

ysth
ysth

Reputation: 98388

Given the "Full Disclosure" comment (but assuming .{0,35}, not .{35}), I'd do

use List::Util qw/max min/;
my $context = 35;
while ( $str =~ /\b$word\b/g ) {
    my $pre = substr( $str, max(0, $-[0] - $context), min( $-[0], $context ) );
    my $post = substr( $str, $+[0], $context );
    my $match = substr( $str, $-[0], $+[0] - $-[0] );
    $pre =~ s/.*\n//s;
    $post =~ s/\n.*//s;
    push @results, "$pre$match$post";
}
print for @results;

You'd skip the substitutions if you really meant (?s:.{0,35}).

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342303

don't have to use regex. basically, just split up the string, use a loop to go over each items, check for "plus" , then get the word from before and after.

my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
@s = split /\s+/,$str;
for($i=0;$i<=scalar @s;$i++){
    if ( "$s[$i]"  eq "plus" ){
        print "$s[$i-1] plus $s[$i+1]\n";
    }
}

Upvotes: 0

Michael Carman
Michael Carman

Reputation: 30831

You can use a m//g instead of s/// and assign to the pos function to rewind the match location before the second term:

use strict;
use warnings;

my $str  = 'In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L';
my $word = 'plus';
my @results;

while ($str =~ /(.{2}\b$word\b(.{2}))/g) {
    push @results, "$1\n";
    pos $str -= length $2;
}
print @results;

Upvotes: 4

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118118

General advice: Don't use s/// when you want m//. Be specific in what you match.

The answer is pos:

#!/usr/bin/perl -l

use strict;
use warnings;

my $str = 'In this example, ' . 'A plus B equals C, ' .
          'D plus E plus F equals G ' .
          'and H plus I plus J plus K equals L';

my $word = "plus";

my @results;

while ( $str =~ /([A-Z] $word [A-Z])/g ) {
    push @results, $1;
    pos($str) -= 1;
}

print "'$_'" for @results;

Output:

C:\Temp> b
'A plus B'
'D plus E'
'E plus F'
'H plus I'
'I plus J'
'J plus K'

Upvotes: 6

Greg Hewgill
Greg Hewgill

Reputation: 992767

Here's one way to do it:

use strict; 
use warnings; 
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L"; 
my $word = "plus"; 
my @results = ();
my $i = 0;
while (substr($str, $i) =~ /(.{2}\b$word\b.{2})/) {
    push @results, "$1\n";
    $i += $-[0] + 1;
}
print @results;

It's not terribly Perl-ish, but it works and it doesn't use too many obscure regular expression tricks. However, you might have to look up the function of the special variable @- in perlvar.

Upvotes: 0

Related Questions