Reputation: 3018
I have some bunch of urls like below
https://data.hova.com/strap/nik/sql_output1574414532.89.zip
https://data.hova.com/strap/asr/sql_output1574414532.89.zip
https://data.hova.com/strap/olr/sql_output1574414532.89.zip
Now I want to extract just the zip
file name ie sql_output1574414532.89.zip
, sql_output1574414532.89.zip
, sql_output1574414532.89.zip
respectively.
Now I could have used a simple split
to get the filenames but if you observe, the directory name before the zip
file changes like nik
, asr
, olr
etc.
So I want to use regex
so that I only look at anything that starts with sql and ends with zip.
So this is what I did
import re
string = "https://data.hova.com/strap/nik/sql_output1574414532.89.zip"
pattern = r'^sql\.zip$'
match = re.search(pattern, string)
print(match)
But the match
comes as None
. What am I doing wrong?
Upvotes: 1
Views: 145
Reputation: 46
The pattern r'^sql\.zip$'
matches only one string: "sql.zip".
For your purpose you need something like sql.+zip$
, or, if you expect that sql
string can be encountered in URL before file name, change it to sql[^/]+zip$
.
Upvotes: 1