Can't parse a website link from a webpage

Question

I've created a script in python with selenium to scrape the website address located within Contact details in a website. However, the problem is there is no url associated with that link (I can click on that link, though).

How can I parse the website link located within Contact details?

from selenium import webdriver

URL = 'https://www.truelocal.com.au/business/vitfit/sydney'

def get_website_link(driver,link):
    driver.get(link)
    website = driver.find_element_by_css_selector("[ng-class*='getHaveSecondaryWebsites'] > span").text
    print(website)

if __name__ == '__main__':
    driver = webdriver.Chrome()
    try:
        get_website_link(driver,URL)
    finally:
        driver.quit()

When I run the script, I get the visible text associate with that link which is Visit website.

Sers · Accepted Answer

Element with "Visit website" text is a span, that has vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank') javascript and not actual href. My suggestion, if your goal is to scrape and not testing, you can use solution below with requests package to get data as json and extract any information you need.
Another one is actually click, as you did.

import requests
import re

headers = {
    'Referer': 'https://www.truelocal.com.au/business/vitfit/sydney',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/73.0.3683.75 Safari/537.36',
    'DNT': '1',
}
response = requests.get('https://www.truelocal.com.au/www-js/configuration.constant.js?v=1552032205066',
                        headers=headers)
assert response.ok

# extract token from response text
token = re.search("token:\s'(.*)'", response.text)[1]

headers['Accept'] = 'application/json, text/plain, */*'
headers['Origin'] = 'https://www.truelocal.com.au'

response = requests.get(f'https://api.truelocal.com.au/rest/listings/vitfit/sydney?&passToken={token}', headers=headers)
assert response.ok
# use response.text to get full json as text and see what information can be extracted.

contact = response.json()["data"]["listing"][0]["contacts"]["contact"]
website = list(filter(lambda x: x["type"] == "website", contact))[0]["value"]
print(website)

print("the end")

Can't parse a website link from a webpage

Answers (1)

Related Questions

Can&#39;t parse a website link from a webpage

Answers (1)

Related Questions

Can't parse a website link from a webpage