Reputation: 97
I am creating a regex that matches a web url that ends in a filename with an image extension. The base url, everything before the filename, will be dynamic. Here's what I got:
import re
text = 'google.com/dsadasd/dsd.jpg'
dynamic_url = 'google.com/dsadasd'
regex = '{}/(.*)(.gif|.jpg|.jpeg|.tiff|.png)'.format(dynamic_url)
re.search(regex, text)
This works, but passes, and should be failing, with the following url:
text = 'google.com/dsadasd/.jpg'
It should only match if there is a filename for the image file. Any way to account for this?
If there are any improvements in this approach that you think could make the regular expression capture other edge cases that I missed based on initial requirements def feel free to say so. Additionally, if there are alternative approaches to this that do not leverage regex, those are appreciated as well (maybe a url parse?). The two most important things to me are performance and clarity (speed performance foremost).
Upvotes: 1
Views: 1539
Reputation: 163372
What you might do is to use anchors to assert the begin ^
and the end $
of the line or use a word boundary \b
To prevent matching for example .jpg
right after the forward /
slash, you could add a character class and add the characters you want to allow for the filename.
In this example I have added one or more word characters and a hyphen [\w-]+
but you can update that to your requirements
The regex part of your code could look like:
^{}/[\w-]+\.(?:gif|jpg|jpeg|tiff|png)$
Upvotes: 0
Reputation: 473903
You may also directly apply os.path.splitext()
:
In [1]: import os
In [2]: text = 'google.com/dsadasd/dsd.jpg'
In [3]: _, extension = os.path.splitext(text)
In [4]: extension
Out[4]: '.jpg'
Then, you may check the extension
against a set of supported file extensions.
Upvotes: 1
Reputation: 17249
You could try this: (.*)(\w+)(.gif|.jpg|.jpeg|.tiff|.png)'
. Just adds a check for something before the ending .whatever
.
Upvotes: 0