Reputation: 352
Forgive me if this is a terribly simple question. It's been a while since I've written regular expressions. Your help to brush of the rust is most appreciated. I am using regex in python
I am trying to parse through some URLs. Here are the typical format of the URLS I am parsing:
https://www.anysite.com/word/123456789/description-of-the-page
https://www.anysite.com/word/123456789/description-of-the-page/someword
https://www.anysite.com/word/123456789/description-of-the-page/thisword
https://www.anysite.com/word/123456789/description-of-the-page/anyword
I would like to write an expression that will only match the first URL and not the last three. That is, I want a regular expression that will only match if there is not a "/" following the "/" following the numeric string "123456789".
Ignoring the main URL, I have tried a negative lookahead assertion without success:
/word\/.+?\/(?!\/).+/
This matches all of four examples.
I can't be specific as to not ending in "/someword" "/thisword" or "/anyword" as I do not have a complete list of these words.
Thanks again for looking and your thoughts!
Upvotes: 2
Views: 360
Reputation: 32189
You can do that as:
^https?:\/\/[^\d]*(\d+)\/[^\/]*$
Demo: http://regex101.com/r/aC8aJ7
Upvotes: 1