Reputation: 1617

How to get specific part of any url using urlparse()?

I have an url like this

url = 'https://grabagun.com/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'

When I use urlparse() function, I am getting result like this:

>>> url = urlparse(url) 
>>> url.path
'/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'

Is it possible to get something like this:

path1 = "firearms"
path2 = "handguns"
path3 = "semi-automatic-handguns"

and I don't want to get any text which have ".html" at the end.

Upvotes: -1

Answers (4)

Bhargav

Reputation: 4118

You have some single / and some path have //...first replace all with same if you want apply directly on URL. For url.path you can do it directly

url = '/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'

url = url.split('/')
url = list(filter(None, url))#remove empty elemnt
url.pop()
print(url)

output list #

['firearms', 'handguns', 'semi-automatic-handguns']

Part 2

If you want to make them variables then simply iterate over them and create variables

for n, val in enumerate(url):
    globals()["path%d"%n] = val

print(path1)

Output:

handguns

Upvotes: 1

imxitiz

Reputation: 3987

One liner solution to your problem could be:

path=urlparse(url).path[1:]

splittedpath=[sp for sp in path.split("/") if not sp.endswith(".html")]
"""
['firearms', 'handguns', 'semi-automatic-handguns']
"""

You can access these by:

print(splittedpath[0]) # 0,1,2... 
# firearms

What we are doing here is, first string of path is removed which is "/" by doing path.path[1:], splitting string path from each occurance of "/" using .split("/") and checking if that splitted string ends with ".html" or not,if not save it.

Upvotes: 1

bener07

Reputation: 29

You can put it all in a array separating them by the /

url.path.split('/')

and if you want to put the them in path1, path2 and so on you can assign the values in the list to variables.

path1, path2, path3 = url.path.split('/')[:3]

I put it only to get the first 3 values of the list. If you don't want the text with .html you can always get the index of the last value and use it in the list slicing like this.

paths = url.path.split('/')
if '.html' in paths[-1]:
    html_text_index = paths.index(paths[-1])
no_html_paths = paths[:html_text_index]

Upvotes: 1

arielkaluzhny

Reputation: 336

path_list = url.path.split('/')

if ".html" in path_list[-1]:
    path_list = path_list[:-1]

will give you a list with each part as an entry and exclude the last one if it contains ".html" in it.

Depending on exactly what you want or how specific/general your use case is you can edit this.

Upvotes: 1

How to get specific part of any url using urlparse()?

Answers (4)

Related Questions