Souvik Ray
Souvik Ray

Reputation: 3018

How to extract certain pattern from a url using regex in Python?

I have some bunch of urls like below

https://data.hova.com/strap/nik/sql_output1574414532.89.zip

https://data.hova.com/strap/asr/sql_output1574414532.89.zip

https://data.hova.com/strap/olr/sql_output1574414532.89.zip

Now I want to extract just the zip file name ie sql_output1574414532.89.zip, sql_output1574414532.89.zip, sql_output1574414532.89.zip respectively.

Now I could have used a simple split to get the filenames but if you observe, the directory name before the zip file changes like nik, asr, olr etc.

So I want to use regex so that I only look at anything that starts with sql and ends with zip.

So this is what I did

import re

string = "https://data.hova.com/strap/nik/sql_output1574414532.89.zip"
pattern = r'^sql\.zip$'
match = re.search(pattern, string)
print(match)

But the match comes as None. What am I doing wrong?

Upvotes: 1

Views: 145

Answers (1)

Budagov Blues
Budagov Blues

Reputation: 46

The pattern r'^sql\.zip$' matches only one string: "sql.zip".

For your purpose you need something like sql.+zip$, or, if you expect that sql string can be encountered in URL before file name, change it to sql[^/]+zip$.

Upvotes: 1

Related Questions