Reputation: 154
I've created a script in python with selenium to scrape the website address located within Contact details
in a website. However, the problem is there is no url associated with that link (I can click on that link, though).
How can I parse the website link located within
Contact details
?
from selenium import webdriver
URL = 'https://www.truelocal.com.au/business/vitfit/sydney'
def get_website_link(driver,link):
driver.get(link)
website = driver.find_element_by_css_selector("[ng-class*='getHaveSecondaryWebsites'] > span").text
print(website)
if __name__ == '__main__':
driver = webdriver.Chrome()
try:
get_website_link(driver,URL)
finally:
driver.quit()
When I run the script, I get the visible text associate with that link which is Visit website
.
Upvotes: 1
Views: 302
Reputation: 12255
Element with "Visit website" text is a span
, that has vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank')
javascript and not actual href.
My suggestion, if your goal is to scrape and not testing, you can use solution below with requests
package to get data as json and extract any information you need.
Another one is actually click, as you did.
import requests
import re
headers = {
'Referer': 'https://www.truelocal.com.au/business/vitfit/sydney',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/73.0.3683.75 Safari/537.36',
'DNT': '1',
}
response = requests.get('https://www.truelocal.com.au/www-js/configuration.constant.js?v=1552032205066',
headers=headers)
assert response.ok
# extract token from response text
token = re.search("token:\\s'(.*)'", response.text)[1]
headers['Accept'] = 'application/json, text/plain, */*'
headers['Origin'] = 'https://www.truelocal.com.au'
response = requests.get(f'https://api.truelocal.com.au/rest/listings/vitfit/sydney?&passToken={token}', headers=headers)
assert response.ok
# use response.text to get full json as text and see what information can be extracted.
contact = response.json()["data"]["listing"][0]["contacts"]["contact"]
website = list(filter(lambda x: x["type"] == "website", contact))[0]["value"]
print(website)
print("the end")
Upvotes: 1