Python 3.5 beautifulsoup unable to read page

Question

When I go through the following process:

open link in browser: http://propaccess.traviscad.org/clientdb/?cid=1
in property search box type: Jim and hit search
click on column view details of the first result

The above steps takes me to the following url: http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=228792

where you can see the data.

However, if I use the following code:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
soup = BeautifulSoup(urlopen(url).read())
print soup

I get the error:





Travis Property Search




Please try again

Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:


Travis Central Appraisal District Website 
Click here to reload the property search to try again

I have tried other ways of importing cookie, etc but I am not able to read the data using python.

Mark · Accepted Answer

Try something like this:

import requests
from bs4 import BeautifulSoup

s = requests.session()
r = s.get('http://propaccess.traviscad.org/clientdb/?cid=1')
r2 = s.get('http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669')

soup = BeautifulSoup(r2.text, 'html.parser')
print(soup.prettify())

This will grab the page that establishes the session and requests.session will save the session data. On the next request it will use the session cookie and grab your text. You should be able to hand that text to BeautifulSoup for parsing.

Python 3.5 beautifulsoup unable to read page

Answers (1)

Related Questions