SY Moon
SY Moon

Reputation: 121

To remove part of the href link?

I'm scraping Google search results.Here's my code part.

def select_wholePage(driver):
    items = driver.find_elements_by_xpath('//*[@id="main"]/div')
    assert isinstance(items, object)
    return items


def get_result(item_in):
    try:
        title = item_in.find_element_by_xpath('.//div/div/a/h3/div').text
        print(title)
    except exceptions.NoSuchElementException:
        return
    try:
        link = item_in.find_element_by_xpath('.//div/div/a').get_attribute('href')
        print(link)
    except exceptions.NoSuchElementException:
        return
    result = (title, link)
    return result

Output -> I can get the desired elements, but when I print the link, "https://www.google.com/url?q=" is attached as shown below. "https://www.google.com/url?q=" enter image description here

How to remove it?

Upvotes: 0

Views: 207

Answers (3)

Ram
Ram

Reputation: 4779

You could strip off that string using lstrip().

s = "https://www.google.com/url?q=<some_query>"
s = s.lstrip("https://www.google.com/url?q=")
print(s)
<some_query>

Upvotes: 2

Tranbi
Tranbi

Reputation: 12721

Don't know if it's the cleanest way but you could do something like this:

google_url_prefix = "https://www.google.com/url?q="
url_cut_id = len(google_url_prefix)
link = link[url_cut_id:]

Upvotes: 1

Daweo
Daweo

Reputation: 36590

If https://www.google.com/url?q= is fixed and always present .replace method should suffice, i.e.:

encased = "https://www.google.com/url?q=https://www.example.com"
core = encased.replace("https://www.google.com/url?q=", "", 1)
print(core)

output

https://www.example.com

I provided third argument to .replace which limits it to at most 1 replacement, in case https://www.google.com/url?q= would appear further.

Upvotes: 1

Related Questions