Jerome
Jerome

Reputation: 25

Using regex to find any last occurrence of a word between two delimiters

Suppose I have the following test string:

Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop

where _ means any characters, eg: StartaGetbbGetcccGetddddStopeeeeeStart....

What I want to extract is any last occurrence of the Get word within Start and Stop delimiters. The result here would be the three bolded Get below.

Start__Get__Get__Get__Stop__Start__Get__Get__Stop__Start__Get__Stop

I precise that I'd like to do this only using regex and as far as possible in a single pass.

Any suggestions are welcome

Thanks'

Upvotes: 0

Views: 2127

Answers (5)

Alan Moore
Alan Moore

Reputation: 75222

Get(?=(?:(?!Get|Start|Stop).)*Stop)

I'm assuming your Start and Stop delimiters will always be properly balanced and they can't be nested.

Upvotes: 1

Toto
Toto

Reputation: 91385

With Perl, i'd do :

my $test = "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop";
$test =~ s#(?<=Start_)((Get_)*)(Get)(?=_Stop)#$1<FOUND>$3</FOUND>#g;
print $test;

output:

Start_Get_Get_<FOUND>Get</FOUND>_Stop_Start_Get_<FOUND>Get</FOUND>_Stop_Start_<FOUND>Get</FOUND>_Stop

You should adapt to your regex flavour.

Upvotes: 0

Peter Boughton
Peter Boughton

Reputation: 112150

Something like this, maybe:

(?<=Start(?:.Get)*)Get(?=.Stop)

That requires variable-length lookbehind support, which not all regex engines support.
It could be made to have a max length, which a few more (but still not all) support, by changing the first * to {0,99} or similar.

Also, in the lookahead, possibly the . should be a .+ or .{1,2} depending on if the double underscore is a typo or not.

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342313

$ echo "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get__Stop" | awk -vRS="Stop" -F"_*" '{print $(NF-1)}'
Get
Get
Get

Upvotes: 0

PolyThinker
PolyThinker

Reputation: 5218

I would have done it with two passes. The first pass find the word "Get", and the second pass count the number of occurrences of it.

Upvotes: 0

Related Questions