MITHU
MITHU

Reputation: 154

Can't parse a website link from a webpage

I've created a script in python with selenium to scrape the website address located within Contact details in a website. However, the problem is there is no url associated with that link (I can click on that link, though).

How can I parse the website link located within Contact details?

from selenium import webdriver

URL = 'https://www.truelocal.com.au/business/vitfit/sydney'

def get_website_link(driver,link):
    driver.get(link)
    website = driver.find_element_by_css_selector("[ng-class*='getHaveSecondaryWebsites'] > span").text
    print(website)

if __name__ == '__main__':
    driver = webdriver.Chrome()
    try:
        get_website_link(driver,URL)
    finally:
        driver.quit()

When I run the script, I get the visible text associate with that link which is Visit website.

Upvotes: 1

Views: 302

Answers (1)

Sers
Sers

Reputation: 12255

Element with "Visit website" text is a span, that has vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank') javascript and not actual href. My suggestion, if your goal is to scrape and not testing, you can use solution below with requests package to get data as json and extract any information you need.
Another one is actually click, as you did.

import requests
import re

headers = {
    'Referer': 'https://www.truelocal.com.au/business/vitfit/sydney',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/73.0.3683.75 Safari/537.36',
    'DNT': '1',
}
response = requests.get('https://www.truelocal.com.au/www-js/configuration.constant.js?v=1552032205066',
                        headers=headers)
assert response.ok

# extract token from response text
token = re.search("token:\\s'(.*)'", response.text)[1]

headers['Accept'] = 'application/json, text/plain, */*'
headers['Origin'] = 'https://www.truelocal.com.au'

response = requests.get(f'https://api.truelocal.com.au/rest/listings/vitfit/sydney?&passToken={token}', headers=headers)
assert response.ok
# use response.text to get full json as text and see what information can be extracted.

contact = response.json()["data"]["listing"][0]["contacts"]["contact"]
website = list(filter(lambda x: x["type"] == "website", contact))[0]["value"]
print(website)

print("the end")

Upvotes: 1

Related Questions