Reputation: 373
I have this string:
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*"
I want to find every substring starting with M
and ending with *
, without *
within them. this means that the above string would give me 4 elements in my final array.
@ORF= (MZEFSRGGRMEAZFE*,MEAZFE*, MQZEFFMAEZF*,MAEZF*)
A simple regex will not do since it does not find overlapping substrings. Is there a simple way to do this?
Upvotes: 1
Views: 95
Reputation: 54381
You can also use a recursive approach instead of an advanced-feature regex to do that. The program below takes each match and reparses the match, but omitting the starting M
so it won't match the whole thing again.
use strict;
use warnings;
use Data::Printer;
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my @matches;
sub parse {
my ( $string ) = @_;
while ($string =~ m/(M[^*]+\*)/g ) {
push @matches, $1;
parse(substr $1, 1);
}
}
parse($line);
p @matches;
Here's the output:
[
[0] "MZEFSRGGRMEAZFE*",
[1] "MEAZFE*",
[2] "MQZEFFMAEZF*",
[3] "MAEZF*"
]
Upvotes: 2
Reputation: 53508
Regular expression matching consumes the pattern as it matches - that's by design.
You can use a lookahead expression to avoid this happening PerlMonks: Using Look-ahead and Look-behind
So something like this will work:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my @matches = $line =~ m/(?=(M[^*]+))/g;
print Dumper \@matches;
Which gives you:
$VAR1 = [
'MZEFSRGGRMEAZFE',
'MEAZFE',
'MQZEFFMAEZF',
'MAEZF'
];
Upvotes: 5