Reputation: 23
I want to separate out the links from the string which don't have ':' in between and do not end with '.jpg' or '.svg', and also start with '/wiki/'.
So these are wrong -
"https://boomerrang.com"
"/wiki/sbsbs:kjanw"
"/wiki/aswaa:asawsa.jpg"
"/wiki/awssa.random.jpg"
"/wiki/boom.jpg"
How the final result should look like -
"/wiki/justthis"
What I tried -
r'^/wiki/.*[^:](?!jpg|svg)$'
But its not evaluating properly, infact its giving all the result which I do not want... I'm kind of new to regex, so please tell me why this is not working, and how should I correct it.
Thanks
Upvotes: 2
Views: 146
Reputation: 89547
Why your pattern doesn't work:
.*[^:]
doesn't prevent a :
to be present in the string since .*
can match it.
(?!jpg|svg)$
doesn't make sense since it says that the end of the string isn't followed by "jpg" or "svg". Obviously the end of the string isn't followed by anything since it's the end of the string. Keep in mind that a lookaround (lookahead or lookbehind), anchors like ^
, $
or a word-boundary \b
are zero-width assertions and don't consume characters, so (?!jpg|svg)
and $
are tested from the same position in the string.
You can try that:
r'^/wiki/[^:]*(?<!\.jpg)(?<!\.svg)$'
The two negative lookbehinds at the end ensure that the string doesn't end with .svg
or .jpg
.
[^:]*
avoids any :
in the string.
Upvotes: 2