Reputation: 27
I need some Perl regular expression help. The following snippet of code:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my @results = ();
1 while $str =~ s/(.{2}\b$word\b.{2})/push(@results,"$1\n")/e;
print @results;
Produces the following output:
A plus B D plus E 2 plus F H plus I 4 plus J 5 plus K
What I want to see is this, where a character already matched can appear in a new match in a different context:
A plus B D plus E E plus F H plus I I plus J J plus K
How do I change the regular expression to get this result? Thanks --- Dan
Upvotes: 2
Views: 2657
Reputation: 75222
Another option is to use a lookahead:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E "
. "plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my $chars = 2;
my @results = ();
push @results, $1
while $str =~ /(?=((.{0,$chars}?\b$word\b).{0,$chars}))\2/g;
print "'$_'\n" for @results;
Within the lookahead, capturing group 1 matches the word along with a variable number of leading and trailing context characters, up to whatever maximum you've set. When the lookahead finishes, the backreference \2
matches "for real" whatever was captured by group 2, which is the same as group 1 except that it stops at the end of the word. That sets pos
where you want it, without requiring you to calculate how many characters you actually matched after the word.
Upvotes: 2
Reputation: 98388
Given the "Full Disclosure" comment (but assuming .{0,35}
, not .{35}
), I'd do
use List::Util qw/max min/;
my $context = 35;
while ( $str =~ /\b$word\b/g ) {
my $pre = substr( $str, max(0, $-[0] - $context), min( $-[0], $context ) );
my $post = substr( $str, $+[0], $context );
my $match = substr( $str, $-[0], $+[0] - $-[0] );
$pre =~ s/.*\n//s;
$post =~ s/\n.*//s;
push @results, "$pre$match$post";
}
print for @results;
You'd skip the substitutions if you really meant (?s:.{0,35})
.
Upvotes: 1
Reputation: 342303
don't have to use regex. basically, just split up the string, use a loop to go over each items, check for "plus" , then get the word from before and after.
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
@s = split /\s+/,$str;
for($i=0;$i<=scalar @s;$i++){
if ( "$s[$i]" eq "plus" ){
print "$s[$i-1] plus $s[$i+1]\n";
}
}
Upvotes: 0
Reputation: 30831
You can use a m//g
instead of s///
and assign to the pos
function to rewind the match location before the second term:
use strict;
use warnings;
my $str = 'In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L';
my $word = 'plus';
my @results;
while ($str =~ /(.{2}\b$word\b(.{2}))/g) {
push @results, "$1\n";
pos $str -= length $2;
}
print @results;
Upvotes: 4
Reputation: 118118
General advice: Don't use s///
when you want m//
. Be specific in what you match.
The answer is pos
:
#!/usr/bin/perl -l
use strict;
use warnings;
my $str = 'In this example, ' . 'A plus B equals C, ' .
'D plus E plus F equals G ' .
'and H plus I plus J plus K equals L';
my $word = "plus";
my @results;
while ( $str =~ /([A-Z] $word [A-Z])/g ) {
push @results, $1;
pos($str) -= 1;
}
print "'$_'" for @results;
Output:
C:\Temp> b 'A plus B' 'D plus E' 'E plus F' 'H plus I' 'I plus J' 'J plus K'
Upvotes: 6
Reputation: 992767
Here's one way to do it:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my @results = ();
my $i = 0;
while (substr($str, $i) =~ /(.{2}\b$word\b.{2})/) {
push @results, "$1\n";
$i += $-[0] + 1;
}
print @results;
It's not terribly Perl-ish, but it works and it doesn't use too many obscure regular expression tricks. However, you might have to look up the function of the special variable @-
in perlvar
.
Upvotes: 0