rkrox 907
rkrox 907

Reputation: 127

How to extract some text from a url in Python

Im trying to have python extract some text out of URL string.

Here's example of URL https://somewebsite/images/products/SkuName/genricFileName.jpg

The SkuName always will come after the 5th "/" and will end by the 6th "/"

I would like to extract 'SkuName'

import urllib.request

images = input('please enter url list separated by ","')
names = input('please enter images names separated by ","')

images = images.split(',')
names =  names.split(',')

for index, image in enumerate(images):
    urllib.request.urlretrieve(image, "images/{}.jpg".format(names[index])) 
print('images downloaded successfully')   

As you can see, the user have to manually enter the SKU Name (which goes under variable 'names')

I would like the user to enter only one input (URL) and python automatically extract the SKUName out of the URL string

Thanks!

Upvotes: 0

Views: 932

Answers (4)

Karmveer Singh
Karmveer Singh

Reputation: 953

You can do it using python regex. Note: change the pattern as per your url

import re
url = 'https://somewebsite/images/products/SkuName/genricFileName.jpg'
pattern = re.compile(r'(?<=(https://somewebsite/images/products/)).*(?=/genricFileName.jpg)', re.I)
sku_name = pattern.search(url).group()

Upvotes: 1

gsuparto
gsuparto

Reputation: 26

You seem to be aware of the split function already. You can use that, in combination with slicing to get you what you need.

skuName = input('url').split('/')[:-2]

This will yield the second to last element in the list. You could also search for the the 6th element by using.

skuName = input('url').split('/')[5]

Upvotes: 0

drumino
drumino

Reputation: 90

If that format is fix you can just split the url and access the second last element from the resulting list:

url = "https://somewebsite/images/products/SkuName/genricFileName.jpg"
skuName = url.split("/")[-2]

Upvotes: 0

Guybrush
Guybrush

Reputation: 2780

If you're sure that the (absolute) position of the name in the URL won't change, then url.split('/')[5] should solve your problem.

Upvotes: 1

Related Questions