cachemoi
cachemoi

Reputation: 373

How do I find overlapping regex in a string?

I have this string:

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*"

I want to find every substring starting with M and ending with *, without * within them. this means that the above string would give me 4 elements in my final array.

@ORF= (MZEFSRGGRMEAZFE*,MEAZFE*, MQZEFFMAEZF*,MAEZF*)

A simple regex will not do since it does not find overlapping substrings. Is there a simple way to do this?

Upvotes: 1

Views: 95

Answers (2)

simbabque
simbabque

Reputation: 54381

You can also use a recursive approach instead of an advanced-feature regex to do that. The program below takes each match and reparses the match, but omitting the starting M so it won't match the whole thing again.

use strict;
use warnings;
use Data::Printer;

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my @matches;

sub parse {
    my ( $string ) = @_;

    while ($string =~ m/(M[^*]+\*)/g ) {
        push @matches, $1;
        parse(substr $1, 1);
    }
}

parse($line);
p @matches;

Here's the output:

[
    [0] "MZEFSRGGRMEAZFE*",
    [1] "MEAZFE*",
    [2] "MQZEFFMAEZF*",
    [3] "MAEZF*"
]

Upvotes: 2

Sobrique
Sobrique

Reputation: 53508

Regular expression matching consumes the pattern as it matches - that's by design.

You can use a lookahead expression to avoid this happening PerlMonks: Using Look-ahead and Look-behind

So something like this will work:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my @matches = $line  =~ m/(?=(M[^*]+))/g;
print Dumper \@matches;

Which gives you:

$VAR1 = [
          'MZEFSRGGRMEAZFE',
          'MEAZFE',
          'MQZEFFMAEZF',
          'MAEZF'
        ];

Upvotes: 5

Related Questions