Reputation: 4807
When I go through the following process:
The above steps takes me to the following url: http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=228792
where you can see the data.
However, if I use the following code:
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
url = 'http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669'
soup = BeautifulSoup(urlopen(url).read())
print soup
I get the error:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=utf-8" />
<title>Travis Property Search</title>
<style type="text/css">
body { text-align: center; padding: 150px; }
h1 { font-size: 50px; }
body { font: 20px Helvetica, sans-serif; color: #333; }
#article { display: block; text-align: left; width: 650px; margin: 0 auto; }
a { color: #dc8100; text-decoration: none; }
a:hover { color: #333; text-decoration: none; }
</style>
</head>
<body>
<div id="article">
<h1>Please try again</h1>
<div>
<p>Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:<br /><br />
<a href="http://www.traviscad.org/">Travis Central Appraisal District Website</a> </p>
<p><b><a href="http://propaccess.traviscad.org/clientdb/?cid=1">Click here to reload the property search to try again</a></b></p>
</div>
</div>
</body>
</html>
I have tried other ways of importing cookie, etc but I am not able to read the data using python.
Upvotes: 0
Views: 142
Reputation: 92471
Try something like this:
import requests
from bs4 import BeautifulSoup
s = requests.session()
r = s.get('http://propaccess.traviscad.org/clientdb/?cid=1')
r2 = s.get('http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=312669')
soup = BeautifulSoup(r2.text, 'html.parser')
print(soup.prettify())
This will grab the page that establishes the session and requests.session
will save the session data. On the next request it will use the session cookie and grab your text. You should be able to hand that text to BeautifulSoup for parsing.
Upvotes: 1