abpta
abpta

Reputation: 11

Scraping Javascript with "onclick"

I am having some trouble scraping the url below: http://102.37.123.153/Lists/eTenders/AllItems.aspx I am using Python with Selenium, but have many "onclick" javascript events to run to get to lowest level of information. Does anyone know how to automate this? Thanks

url = 'http://102.37.123.153/Lists/eTenders/AllItems.aspx'
chrome_options = Options()  
chrome_options.add_argument("--headless")  
browser = webdriver.Chrome('c:/Users/AB/Dropbox/ITProjects/Scraping/chromedriver.exe', options=chrome_options)
res = browser.get(url)
time.sleep(10)
source = browser.page_source
soup = BeautifulSoup(source)
for link in soup.find_all('a'):
    if link.get('href') == 'javascript:':
        print(link)

Upvotes: 1

Views: 198

Answers (1)

Ahmed I. Elsayed
Ahmed I. Elsayed

Reputation: 2130

You don't need selenium with this website, you need patience. Let me explain how you'd approach that.

  1. Click X
    • Y opens, click Y
      • Z opens, click Z.
        • Goes on..........

What happened here is that when you've clicked X, an AJAX request was made to get Y and after you click Y, another AJAX was made to get Z and then this goes on.

So you can just simulate those requests, open the networks tab and see how does it craft the requests then make the same ones in your code then get the response, based on it, do the next request and the cycle will go on till you get to the innermost level of the tree.

This approach has no UI and is technically-speaking, more unfriendly and harder to implement. But it's more efficient, on the other side, you can just select your clickable elements with selenium like

eleme = driver.find_elemnent_by_x('x')
elem.click()

And it will also work

I'd also note that sometimes, links don't AJAX, they just hide the info but it's in the source code. To know what you'll recieve in your response, R-click in the website and choose View page source and note that this is different than inspect element.

Upvotes: 1

Related Questions