Gomino
Gomino

Reputation: 12347

strange regex ungreedy behavior

I'm trying to understand why the following regex:\/.+?.ext\/ is not working as exptected in the following sentence:http://slash1/slash2/slash3.ext/slash4.

Indeed, I'm only interested in matching the part of the url having the '.ext' extension. I first though adding the ungreedy character would reduce the scope to the closer backslash, but it is not the case, it actually match: //slash1/slash2/slash3.ext/

here is the link to test it: http://rubular.com/r/CjJZFssQRF

EDIT: Just in case someone else land here, I finally ended up using the following regex:[^\/]+?\.ext updated rubular:http://rubular.com/r/FKcBQI50Lm

Upvotes: 1

Views: 204

Answers (2)

JonM
JonM

Reputation: 1374

Try this instead:

\/[^\/]+?.ext\/

The regex engine works from the very first character and will keep attempting to match with your expression for each subsequent character. That's just how regular expressions are executed. Think of it like this

^.*?\/[^\/]+?\.ext\/.*$

Upvotes: 0

KeyNone
KeyNone

Reputation: 9150

Your regex matches everything between the first slash it encounters and ext. This explains your match.

You have two possibilities now. You can either go for look-arounds, which are more complicated, or you simply disallow slashes to be matched between two slashes:

\/[^\/]+?\.ext\/

(note: I escaped the dot that is part of the extension, otherwise it would match slash3aext)

demo @ regex101

I'm just taking a guess here, but I think you "thought" from right to left (when i encounter .ext i want everything until i encounter a slash to the left), when you're supposed to think from left to right, just as a regex examines your string.

Upvotes: 4

Related Questions