Reputation: 1617
I have an url like this
url = 'https://grabagun.com/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'
When I use urlparse()
function, I am getting result like this:
>>> url = urlparse(url)
>>> url.path
'/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'
Is it possible to get something like this:
path1 = "firearms"
path2 = "handguns"
path3 = "semi-automatic-handguns"
and I don't want to get any text which have ".html" at the end.
Upvotes: -1
Views: 248
Reputation: 4118
You have some single /
and some path have //
...first replace all with same if you want apply directly on URL. For url.path
you can do it directly
url = '/firearms/handguns/semi-automatic-handguns/glock-19-gen-5-polished-nickel-9mm-4-02-inch-barrel-15-rounds-exclusive.html'
url = url.split('/')
url = list(filter(None, url))#remove empty elemnt
url.pop()
print(url)
output list #
['firearms', 'handguns', 'semi-automatic-handguns']
Part 2
If you want to make them variables then simply iterate over them and create variables
for n, val in enumerate(url):
globals()["path%d"%n] = val
print(path1)
Output:
handguns
Upvotes: 1
Reputation: 3987
One liner solution to your problem could be:
path=urlparse(url).path[1:]
splittedpath=[sp for sp in path.split("/") if not sp.endswith(".html")]
"""
['firearms', 'handguns', 'semi-automatic-handguns']
"""
You can access these by:
print(splittedpath[0]) # 0,1,2...
# firearms
What we are doing here is, first string of path is removed which is "/" by doing path.path[1:]
, splitting string path from each occurance of "/" using .split("/")
and checking if that splitted string ends with ".html" or not,if not save it.
Upvotes: 1
Reputation: 29
You can put it all in a array separating them by the /
url.path.split('/')
and if you want to put the them in path1, path2 and so on you can assign the values in the list to variables.
path1, path2, path3 = url.path.split('/')[:3]
I put it only to get the first 3 values of the list. If you don't want the text with .html you can always get the index of the last value and use it in the list slicing like this.
paths = url.path.split('/')
if '.html' in paths[-1]:
html_text_index = paths.index(paths[-1])
no_html_paths = paths[:html_text_index]
Upvotes: 1
Reputation: 336
path_list = url.path.split('/')
if ".html" in path_list[-1]:
path_list = path_list[:-1]
will give you a list with each part as an entry and exclude the last one if it contains ".html" in it.
Depending on exactly what you want or how specific/general your use case is you can edit this.
Upvotes: 1