Click and scrape aspx page using BS4 python

I am trying to scrape website by clicking on a button. I tried using firebug and google chrome console. I could not catch the request that it is sending in order to avoid clicking on the button. I am seeing only two .js files as request when I click on the search button in the following URL

http://www.icsi.edu/Facilities/MembersDirectory.aspx

Upvotes: 3

Answers (1)

Jeff

Reputation: 227

I think the easiest way to handle this would be to use Selenium's WebDriver.

Link: http://www.seleniumhq.org/docs/03_webdriver.jsp#introducing-webdriver

If you have pip installed, a simple

pip install selenium

should work. I recommend using Firefox as your browser.

You could use Selenium to download the pages and then parse it with BS4 afterwards. Here's a simple script that will input "Foo" and "Bar" into the form and then click on the "Search" button.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.icsi.edu/Member/MembersDirectory.aspx")
# Alternatively, link directly to the form:
# driver.get("https://www.icsi.in/student/Members/MemberSearch.aspx?SkinSrc=%5BG%5DSkins/IcsiTheme/IcsiIn-Bare&ContainerSrc=%5BG%5DContainers/IcsiTheme/NoContainer")

# Locate the elements.
first = driver.find_element_by_id("dnn_ctr410_MemberSearch_txtFirstName")
last = driver.find_element_by_id("dnn_ctr410_MemberSearch_txtLastName")
search = driver.find_element_by_id("dnn_ctr410_MemberSearch_btnSearch")

# Input the data and click submit.
first.send_keys("Foo")
last.send_keys("Bar")
search.click()

As a bonus, here's how to iterate through the pages of results:

# next_page should be redeclared every time you visit a new page.
next_page = driver.find_element_by_class_name("rgPageNext")
next_page.click()

Upvotes: 3

Click and scrape aspx page using BS4 python

Answers (1)

Related Questions