Reputation: 788
i have the following string
https://www.example.com/int/de
and want to match the language code at the end of the url, eg 'de' i do that with this regex
/\..*\/.*\/([^\/?]*)\/?$/gi
I would also like to get the same result if the URL ends with a slash
But with https://www.example.com/int/de/
i only get a full match, but the group dont match 'de' anymore, although the last slash is optional in the regex
can someone the my mistake here?
Upvotes: 2
Views: 524
Reputation: 163217
As an alternative you could consider using parse_url with explode and rtrim to only get the last part.
$strings = [
"https://www.example.com/int/de/",
"https://www.example.com/int/de"
];
foreach ($strings as $string) {
$parts = explode("/", rtrim(parse_url($string, PHP_URL_PATH), '/'));
echo end($parts) . "<br>";
}
That would give you:
de
de
Upvotes: 2
Reputation: 626738
The mistake is not obvious, but quite a usual one: the "generic" greedy dot matching pattern followed with a series of optional subpatterns (patterns that can match an empty string).
The \..*\/.*\/([^\/?]*)\/?$
pattern matches like this: \..*
matches a .
and then any 0+ chars as many as possible, then backtracking starts for \/
to match a /
that is the rightmost /
in the string (the last one), then .*\/
matches again any 0+ chars as many as possible and then makes the engine backtrack even further and forces it to discard the previously found /
and re-match the /
that is before to accommodate for another rightmost /
in the string. Then, finally comes ([^\/?]*)\/?$
, but the previous .*\/
already matched in the URL with /
at the end, and the regex index is at the string end. So, since ([^\/?]*)
can match 0+ chars other than ?
and /
and \/?
can match 0 /
chars, they both match empty strings at the end of the string, and $
calls it a day and the regex engine returns a valid match with an empty value in Group 1.
Get rid of greedy dots, use a
'~([^\/?]+)\/?$~'
See the regex demo
Details
([^\/?]+)
- Capturing group 1: one or more chars other than ?
and /
\/?
- 1 or 0 /
chars$
- at the end of the string.Upvotes: 2
Reputation: 2996
The question mark matches zero or 1 character. You need more than one to match "de". Try using .*
or .+
instead of ?
.
Btw, probably more maintainable RegEx would be:
/.*\/([^/]*)\/?$/gi
That regex says 'match anything (.*
), followed by a forward slash (\/
), followed by something that is not a forward slash, zero or more times ([^/]*
), followed by the optional forward slash (\/?
), followed by the end of text ($
)'. This way, all the characters before the last forward slash and the language part will be matched in the 'match anything' part of the regex. Note the parentheses around the part that represents the language match.
Upvotes: 0