me_man
me_man

Reputation: 97

Regex to match image extensions with dynamic url in Python

I am creating a regex that matches a web url that ends in a filename with an image extension. The base url, everything before the filename, will be dynamic. Here's what I got:

import re

text = 'google.com/dsadasd/dsd.jpg'

dynamic_url = 'google.com/dsadasd'
regex = '{}/(.*)(.gif|.jpg|.jpeg|.tiff|.png)'.format(dynamic_url)

re.search(regex, text)

This works, but passes, and should be failing, with the following url:

text = 'google.com/dsadasd/.jpg'

It should only match if there is a filename for the image file. Any way to account for this?

If there are any improvements in this approach that you think could make the regular expression capture other edge cases that I missed based on initial requirements def feel free to say so. Additionally, if there are alternative approaches to this that do not leverage regex, those are appreciated as well (maybe a url parse?). The two most important things to me are performance and clarity (speed performance foremost).

Upvotes: 1

Views: 1539

Answers (3)

The fourth bird
The fourth bird

Reputation: 163372

What you might do is to use anchors to assert the begin ^ and the end $ of the line or use a word boundary \b

To prevent matching for example .jpg right after the forward / slash, you could add a character class and add the characters you want to allow for the filename.

In this example I have added one or more word characters and a hyphen [\w-]+ but you can update that to your requirements

The regex part of your code could look like:

^{}/[\w-]+\.(?:gif|jpg|jpeg|tiff|png)$

Test Python

Upvotes: 0

alecxe
alecxe

Reputation: 473903

You may also directly apply os.path.splitext():

In [1]: import os

In [2]: text = 'google.com/dsadasd/dsd.jpg'

In [3]: _, extension = os.path.splitext(text)

In [4]: extension
Out[4]: '.jpg'

Then, you may check the extension against a set of supported file extensions.

Upvotes: 1

Colin Ricardo
Colin Ricardo

Reputation: 17249

You could try this: (.*)(\w+)(.gif|.jpg|.jpeg|.tiff|.png)'. Just adds a check for something before the ending .whatever.

Upvotes: 0

Related Questions