Reputation: 127
Im trying to have python extract some text out of URL string.
Here's example of URL https://somewebsite/images/products/SkuName/genricFileName.jpg
The SkuName always will come after the 5th "/" and will end by the 6th "/"
I would like to extract 'SkuName'
import urllib.request
images = input('please enter url list separated by ","')
names = input('please enter images names separated by ","')
images = images.split(',')
names = names.split(',')
for index, image in enumerate(images):
urllib.request.urlretrieve(image, "images/{}.jpg".format(names[index]))
print('images downloaded successfully')
As you can see, the user have to manually enter the SKU Name (which goes under variable 'names')
I would like the user to enter only one input (URL) and python automatically extract the SKUName out of the URL string
Thanks!
Upvotes: 0
Views: 932
Reputation: 953
You can do it using python regex. Note: change the pattern as per your url
import re
url = 'https://somewebsite/images/products/SkuName/genricFileName.jpg'
pattern = re.compile(r'(?<=(https://somewebsite/images/products/)).*(?=/genricFileName.jpg)', re.I)
sku_name = pattern.search(url).group()
Upvotes: 1
Reputation: 26
You seem to be aware of the split function already. You can use that, in combination with slicing to get you what you need.
skuName = input('url').split('/')[:-2]
This will yield the second to last element in the list. You could also search for the the 6th element by using.
skuName = input('url').split('/')[5]
Upvotes: 0
Reputation: 90
If that format is fix you can just split the url and access the second last element from the resulting list:
url = "https://somewebsite/images/products/SkuName/genricFileName.jpg"
skuName = url.split("/")[-2]
Upvotes: 0
Reputation: 2780
If you're sure that the (absolute) position of the name in the URL won't change, then url.split('/')[5]
should solve your problem.
Upvotes: 1