Reputation: 11
I am having some trouble scraping the url below: http://102.37.123.153/Lists/eTenders/AllItems.aspx I am using Python with Selenium, but have many "onclick" javascript events to run to get to lowest level of information. Does anyone know how to automate this? Thanks
url = 'http://102.37.123.153/Lists/eTenders/AllItems.aspx'
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome('c:/Users/AB/Dropbox/ITProjects/Scraping/chromedriver.exe', options=chrome_options)
res = browser.get(url)
time.sleep(10)
source = browser.page_source
soup = BeautifulSoup(source)
for link in soup.find_all('a'):
if link.get('href') == 'javascript:':
print(link)
Upvotes: 1
Views: 198
Reputation: 2130
You don't need selenium with this website, you need patience. Let me explain how you'd approach that.
What happened here is that when you've clicked X, an AJAX
request was made to get Y
and after you click Y
, another AJAX
was made to get Z
and then this goes on.
So you can just simulate
those requests, open the networks tab and see how does it craft the requests then make the same ones in your code then get the response, based on it, do the next request and the cycle will go on till you get to the innermost level of the tree.
This approach has no UI and is technically-speaking, more unfriendly and harder to implement. But it's more efficient, on the other side, you can just select your clickable elements with selenium
like
eleme = driver.find_elemnent_by_x('x')
elem.click()
And it will also work
I'd also note that sometimes, links don't AJAX
, they just hide the info but it's in the source code. To know what you'll recieve in your response, R-click in the website and choose View page source
and note that this is different than inspect element
.
Upvotes: 1