numpy
numpy

Reputation: 23

Why is this regex expression not working?

I want to separate out the links from the string which don't have ':' in between and do not end with '.jpg' or '.svg', and also start with '/wiki/'.

So these are wrong -

"https://boomerrang.com"
"/wiki/sbsbs:kjanw"
"/wiki/aswaa:asawsa.jpg"
"/wiki/awssa.random.jpg"
"/wiki/boom.jpg"

How the final result should look like -

"/wiki/justthis"

What I tried -

r'^/wiki/.*[^:](?!jpg|svg)$'

But its not evaluating properly, infact its giving all the result which I do not want... I'm kind of new to regex, so please tell me why this is not working, and how should I correct it.

Thanks

Upvotes: 2

Views: 146

Answers (1)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

Why your pattern doesn't work:

.*[^:] doesn't prevent a : to be present in the string since .* can match it.

(?!jpg|svg)$ doesn't make sense since it says that the end of the string isn't followed by "jpg" or "svg". Obviously the end of the string isn't followed by anything since it's the end of the string. Keep in mind that a lookaround (lookahead or lookbehind), anchors like ^, $ or a word-boundary \b are zero-width assertions and don't consume characters, so (?!jpg|svg) and $ are tested from the same position in the string.

You can try that:

r'^/wiki/[^:]*(?<!\.jpg)(?<!\.svg)$'

The two negative lookbehinds at the end ensure that the string doesn't end with .svg or .jpg.

[^:]* avoids any : in the string.

Upvotes: 2

Related Questions