Reputation: 11
I am using selenium webdriver to try scrape information from realestate.com.au, here is my code:
from selenium.webdriver import Chrome from bs4 import BeautifulSoup
path = 'C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe'
url = 'https://www.realestate.com.au/buy'
url2 = 'https://www.realestate.com.au/property-house-nsw-castle+hill-134181706'
webdriver = Chrome(path)
webdriver.get(url)
soup = BeautifulSoup(webdriver.page_source, 'html.parser')
print(soup)
it works fine with URL but when I try to do the same to open url2, it opens up a blank page, and I checked the console get the following: "Failed to load resource: the server responded with a status of 429 () about:blank:1 Failed to load resource: net::ERR_UNKNOWN_URL_SCHEME 149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint:1 Failed to load resource: the server responded with a status of 404 ()"
while opening up URL, I tried to search for anything, which also leads to a blank page like url2.
Upvotes: 1
Views: 1312
Reputation: 123
It looks like the www.realestate.com.au website is using an Akamai security tool.
A quick DNS lookup shows that www.realestate.com.au resolves to dualstack.realestate.com.au.edgekey.net.
They are most likely using the Bot Manager product (https://www.akamai.com/us/en/products/security/bot-manager.jsp). I have encountered this on another website recently.
Typically rotating user agents and IP addresses (ideally using residential proxies) should do the trick. You want to load up the site with a "fresh" browser profile each time. You should also check out https://github.com/67-6f-64/akamai-sensor-data-bypass
Upvotes: 1
Reputation: 71
I think you should try adding driver.implicitly_wait(10)
before your get line, as this will add an implicit wait, in case the page loads too slowly for the driver to pull the site. Also you should consider trying out the Firefox webdriver, since this bug appears to be only affecting chromium browsers.
Upvotes: 0