Reputation: 95
This is the url:
url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"
I need to remove the part after the .html, so it becomes:
"www.face.com/me/4000517004580.html"
Upvotes: 0
Views: 74
Reputation: 1557
The builtin urllib
library can be used here.
from urllib.parse import urljoin, urlparse
url = 'www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c'
output = urljoin(url, urlparse(url).path)
Upvotes: 1
Reputation: 337
You can use python's urllib to parse the url into parts and then remove the query string from the url
from urllib.parse import urlparse
url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"
parse_result = urlparse(url)
url = parse_result._replace(query="").geturl() # Remove query from url
Upvotes: 2
Reputation: 83577
When you are not sure how to approach a problem, I suggest starting with some documentation. For example, you can check out the string methods and common string operations.
Scrolling through this list, you will read about the find()
function:
Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.
So to find the "?"
you can do this:
i = url.find("?")
Rather than thinking about how to remove part of the string, let's figure out how to keep the part we want. We can do this with a slice:
url = url[:i]
Upvotes: 1
Reputation: 3598
Try:
url.split('.html')[0]+'.html'
result:
'www.face.com/me/4000517004580.html'
Upvotes: 1