Reputation: 6842
Working on a Java regular expression that will match either "es" or "s" at the end of the string and return the substring without that suffix. Seems easy, but I can't get the 'e' to match with the expressions I'm trying.
Here's the output I should get:
"inches" -> "inch"
"meters" -> "meter"
"ounces" -> "ounc"
but with this regular expression:
Pattern.compile("(.+)(es|s)$", Pattern.CASE_INSENSITIVE);
I'm actually getting:
"inches" -> "inche"
After some research I discovered that the ".+" part of my search is too greedy, and changing it to this:
Pattern.compile("(.+?)(es|s)$", Pattern.CASE_INSENSITIVE);
fixes the problem. My question, though, is why did the 's' match at all? If the 'greedy' nature of the algorithm was the problem, shouldn't it have matched the whole string?
Upvotes: 1
Views: 2611
Reputation: 1545
When it matches greedily, it matches as much as it can while still meeting the expression. So when it's greedy, it will take everything except the s, because it cannot take the s and still meet the expression. When it matches non-greedily, it matches as little as possible while still meeting the expression. Therefore, it will take everything except the 'es', because that is as little as it can take while still meeting the expression.
Upvotes: 4
Reputation: 477533
Short answer
Greedy doesn't mean possessive. Greedy aims to consume/eat as much as possible; but will stop from the moment a string will no longer match otherwise.
Long answer
In regular expressions the Kleene star (*
) is greedy, it means it tries to take as much as possible, but not more. Consider the regex:
(.+)(es|s)$
here .+
aims to eat as much as possible. But you can only reach the end of the regex, when you somehow manage to pass (es|s)
, which is only possible if it ends with at least one s
. Or if we align your string inches
:
(.+) (es|e)$
inche s
(spaces added). In other words .+
.
When you make it non-greedy, the .+?
tries to give up eating as soon as possible. For the string inches
, this is after the inch
:
(.+?) (es|e)$
inch es
It cannot give up earlier, because then the h
should somehow have to match with (es|e)
.
Upvotes: 3