Reputation: 2786
If I have an unknown string of the structure:
"stuff I don't care about THING different stuff I don't care about THING ... THING even more stuff I don't care about THING stuff I care about"
I want to capture the "stuff I care about" which will always be after the last occurrence of THING. There is the potential for 0 occurrences of THING, or many. If there are 0 occurrences then there is no stuff I care about. The string can't start or end with THING.
Some possible strings:
"stuff I don't care about THING stuff I care about"
"stuff I don't care about"
Some not possible strings:
"THING stuff I care about"
"stuff I don't care about THING stuff I don't care about THING"
My current solution to this problem is to use a regex with two greedy quantifiers as follows:
if( /.*THING(.*)/ ) {
$myStuff = $1;
}
It seems to be working, but my question is about how the two greedy quantifiers will interact with each other. Is the first (leftmost) greedy quantifier always "more greedy" than the second?
Basically am I guaranteed not to get a split like the following:
"stuff I don't care about THING"
$1 = "different stuff I don't care about THING even more stuff I don't care about THING stuff I care about"
Compared to the split I do want:
"stuff I don't care about THING different stuff I don't care about THING even more stuff I don't care about THING"
"stuff I care about"
Upvotes: 5
Views: 815
Reputation: 6613
Here is my take.
/^(?!THING).+THING((?:(?!THING).)+)$/
Accepts a string with 1 or more occurrences of THING. THING cannot be at the beginning or end of the string. It gets the text after the last time THING appears.
Edit: Added check for 'THING' at the beginning of the string.
EDIT: Wow, rereading your specs (that I really misread). You said If there are 0 occurrences then there is no stuff I care about. The string can't start or end with THING.
Then your regex is fine. tripleee
explained the situation well.
Upvotes: 0
Reputation: 126742
During the matching process, .*THING
will initially match everything up to and including the last occurrence of THING
If there is no way the rest of the pattern can match, it will backtrack by becoming shorter, and match everything up to and including the last but one occurrence of THING
, and again attempt the rest of the pattern
However the rest of the pattern is .*
which will always match because it will match an empty string
Therefore, .*THING(.*)
will match up to and including the last occurrence of THING
, and will match and capture the rest of the string
Note that .
will match anything except newlines. If there could be newlines in your text then you will want to use the /s
modifier to get it to match anything at all
Note also that if the pattern fails to match (because, say, there is no THING
in the string) then $1
will remain unchanged. It will still contain whatever it was set to by the most recent successful pattern match. This means that you must check the status of the pattern match before using the value of $1
Upvotes: 4
Reputation: 189648
Regex returns the longest leftmost match. The first wildcard will initially match through to the end of line, then successively backtrack a character at a time until the rest of the regex yields a match, i.e. so that the last THING
in the string is matched.
Upvotes: 12