How do I find overlapping regex in a string?

Question

I have this string:

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*"

I want to find every substring starting with M and ending with *, without * within them. this means that the above string would give me 4 elements in my final array.

@ORF= (MZEFSRGGRMEAZFE*,MEAZFE*, MQZEFFMAEZF*,MAEZF*)

A simple regex will not do since it does not find overlapping substrings. Is there a simple way to do this?

Sobrique · Accepted Answer

Regular expression matching consumes the pattern as it matches - that's by design.

You can use a lookahead expression to avoid this happening PerlMonks: Using Look-ahead and Look-behind

So something like this will work:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my @matches = $line  =~ m/(?=(M[^*]+))/g;
print Dumper \@matches;

Which gives you:

$VAR1 = [
          'MZEFSRGGRMEAZFE',
          'MEAZFE',
          'MQZEFFMAEZF',
          'MAEZF'
        ];

How do I find overlapping regex in a string?

Answers (2)

Related Questions