Reputation: 15
I am a beginner at HTML and web scraping and am trying to get the below shown data using Python BeautifulSoup.
[
Theft06/24/15 08:47 PM2000 BLOCK OF S COLLEGE AV
Vandalism06/24/15 07:32 PM3600 BLOCK OF WELLBORN RD
Theft06/24/15 07:30 PM800 BLOCK OF RIO GRANDE LN
Theft06/24/15 06:40 PM1800 BLOCK OF FINFEATHER RD
]
But when I parse the site http://spotcrime.com/#77801
, I can't see the div in the parsed URL so cannot get the data.
The code that I am using is:
html=urllib2.urlopen('http://spotcrime.com/#77801')
soup = BeautifulSoup(html.read())
print soup
Upvotes: 1
Views: 1457
Reputation: 474171
Instead of a main crimes container, there is only this received by urlopen
:
<div id="table_container" class="list-group crime-list" style="margin-top: -30px;">
<h3>Loading Crime Data...</h3>
<p>City and county crime map showing crime incident data down to neighborhood crime</p>
</div>
This is because the main container is constructed with the help of an additional API call to http://api.spotcrime.com/crimes.json
endpoint and javascript logic being executed in the browser.
What you can do is to simulate that API call in your code with requests
. Working example:
import requests
url = "http://spotcrime.com/#77801"
crimes_url = "http://api.spotcrime.com/crimes.json"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36'}
with requests.Session() as session:
session.headers = headers
session.get(url)
data = {
"lat": "30.6423514",
"lon": "-96.3704778",
"radius": "0.02",
"key": "spotcrime-private-api-key",
"_": "1435453242689"
}
response = session.get(crimes_url, data=data)
response = response.json()
for item in response["crimes"]:
print item
It prints dictionaries corresponding to each row in the crime table:
{u'cdid': 64482204, u'lon': -96.3661035, u'lat': 30.6507387, u'link': u'http://spotcrime.com/crime/64482204-6737a0085bd9aff31548993910efa35a', u'address': u'2000 BLOCK OF S COLLEGE AV', u'date': u'06/24/15 08:47 PM', u'type': u'Theft'}
{u'cdid': 64482189, u'lon': -96.3594859, u'lat': 30.6299681, u'link': u'http://spotcrime.com/crime/64482189-345f4eca1c977f43e97ea4981f73d4de', u'address': u'3600 BLOCK OF WELLBORN RD', u'date': u'06/24/15 07:32 PM', u'type': u'Vandalism'}
...
{u'cdid': 64370976, u'lon': -96.361556, u'lat': 30.631685, u'link': u'http://spotcrime.com/crime/64370976-dc6e6dbb29fc7376c2b82356c45d281d', u'address': u'3600 BLOCK OF WELLBORN RD #802', u'date': u'06/18/15 12:37 PM', u'type': u'Arrest'}
{u'cdid': 64371003, u'lon': -96.3539954, u'lat': 30.6434707, u'link': u'http://spotcrime.com/crime/64371003-d9934d9b9d83c1867871701874c45523', u'address': u'2900 BLOCK OF S TEXAS AVENUE', u'date': u'06/18/15 09:56 AM', u'type': u'Vandalism'}
Upvotes: 0
Reputation: 61
You can't find the div because it's dynamically loaded and inserted by javascript. What you can do in this case however, is replicate the ajax request that fetches all this crime data.
It seems like it their internal api doesn't require any sort of authentication, so you can just go ahead and send the following api request:
GET api.spotcrime.com/crimes.json?lat=30.639155&lon=-96.3647937&radius=0.02&key=spotcrime-private-api-key
As a bonus, you don't need to scrape the site as well, since everything is neatly returned as JSON objects.
Upvotes: 1