user1500158
user1500158

Reputation: 352

Regular expression matches everything except "/"

Forgive me if this is a terribly simple question. It's been a while since I've written regular expressions. Your help to brush of the rust is most appreciated. I am using regex in python

I am trying to parse through some URLs. Here are the typical format of the URLS I am parsing:

https://www.anysite.com/word/123456789/description-of-the-page
https://www.anysite.com/word/123456789/description-of-the-page/someword
https://www.anysite.com/word/123456789/description-of-the-page/thisword
https://www.anysite.com/word/123456789/description-of-the-page/anyword

I would like to write an expression that will only match the first URL and not the last three. That is, I want a regular expression that will only match if there is not a "/" following the "/" following the numeric string "123456789".

Ignoring the main URL, I have tried a negative lookahead assertion without success:

/word\/.+?\/(?!\/).+/

This matches all of four examples.

I can't be specific as to not ending in "/someword" "/thisword" or "/anyword" as I do not have a complete list of these words.

Thanks again for looking and your thoughts!

Upvotes: 2

Views: 360

Answers (2)

sshashank124
sshashank124

Reputation: 32189

You can do that as:

^https?:\/\/[^\d]*(\d+)\/[^\/]*$

Demo: http://regex101.com/r/aC8aJ7

Upvotes: 1

Toto
Toto

Reputation: 91385

How about:

/word\/[^\/]+\/[^\/]+/

Upvotes: 0

Related Questions