Using requests.get() on a page that uses a servlet

Question

I'm trying to scrape data from the below webpage using the requests library and BeautifulSoup in Python. Unfortunately, it appears the website uses a servlet to retrieve the data, and I'm not quite sure how to handle it.

I've tried both querying from the results page directly:

http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?bin=1014398&go4=+GO+&requestid=0
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html')

And also querying from the search page:

url = 'http://a810-bisweb.nyc.gov/bisweb/bispi00.jsp'
html = requests.get(url, params={'bin':1014398})
soup = BeautifulSoup(html.text, 'html')

Both end with the request timing out, presumably because I am not properly formatting my request. Is there a way to successfully capture the html from the results page?

Sushil · Accepted Answer

Try using selenium:

from bs4 import BeautifulSoup
from selenium import webdriver
import time

url = 'http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?bin=1014398&go4=+GO+&requestid=0'
driver = webdriver.Chrome()
driver.get(url)

time.sleep(3)

soup = BeautifulSoup(driver.page_source, 'html5lib')

driver.close()

Using requests.get() on a page that uses a servlet

Answers (1)

Related Questions