Reputation: 25
Suppose I have the following test string:
Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop
where _ means any characters, eg: StartaGetbbGetcccGetddddStopeeeeeStart....
What I want to extract is any last occurrence of the Get word within Start and Stop delimiters. The result here would be the three bolded Get below.
Start__Get__Get__Get__Stop__Start__Get__Get__Stop__Start__Get__Stop
I precise that I'd like to do this only using regex and as far as possible in a single pass.
Any suggestions are welcome
Thanks'
Upvotes: 0
Views: 2127
Reputation: 75222
Get(?=(?:(?!Get|Start|Stop).)*Stop)
I'm assuming your Start
and Stop
delimiters will always be properly balanced and they can't be nested.
Upvotes: 1
Reputation: 91385
With Perl, i'd do :
my $test = "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop";
$test =~ s#(?<=Start_)((Get_)*)(Get)(?=_Stop)#$1<FOUND>$3</FOUND>#g;
print $test;
output:
Start_Get_Get_<FOUND>Get</FOUND>_Stop_Start_Get_<FOUND>Get</FOUND>_Stop_Start_<FOUND>Get</FOUND>_Stop
You should adapt to your regex flavour.
Upvotes: 0
Reputation: 112150
Something like this, maybe:
(?<=Start(?:.Get)*)Get(?=.Stop)
That requires variable-length lookbehind support, which not all regex engines support.
It could be made to have a max length, which a few more (but still not all) support, by changing the first *
to {0,99}
or similar.
Also, in the lookahead, possibly the .
should be a .+
or .{1,2}
depending on if the double underscore is a typo or not.
Upvotes: 0
Reputation: 342313
$ echo "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get__Stop" | awk -vRS="Stop" -F"_*" '{print $(NF-1)}'
Get
Get
Get
Upvotes: 0
Reputation: 5218
I would have done it with two passes. The first pass find the word "Get", and the second pass count the number of occurrences of it.
Upvotes: 0