Ravanelli
Ravanelli

Reputation: 95

remove part of a url using regex

This is the url:

url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"

I need to remove the part after the .html, so it becomes:

"www.face.com/me/4000517004580.html"

Upvotes: 0

Views: 74

Answers (4)

JPI93
JPI93

Reputation: 1557

The builtin urllib library can be used here.

from urllib.parse import urljoin, urlparse

url = 'www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c' 
output = urljoin(url, urlparse(url).path) 

Upvotes: 1

sqz
sqz

Reputation: 337

You can use python's urllib to parse the url into parts and then remove the query string from the url

from urllib.parse import urlparse
url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"

parse_result = urlparse(url)
url = parse_result._replace(query="").geturl()  # Remove query from url

Upvotes: 2

Code-Apprentice
Code-Apprentice

Reputation: 83577

When you are not sure how to approach a problem, I suggest starting with some documentation. For example, you can check out the string methods and common string operations.

Scrolling through this list, you will read about the find() function:

Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

So to find the "?" you can do this:

i = url.find("?")

Rather than thinking about how to remove part of the string, let's figure out how to keep the part we want. We can do this with a slice:

url = url[:i]

Upvotes: 1

ipj
ipj

Reputation: 3598

Try:

url.split('.html')[0]+'.html'

result:

'www.face.com/me/4000517004580.html'

Upvotes: 1

Related Questions