Reputation: 21
I recently started learning python and one of the first projects I did was to scrap updates from my son's classroom web page and send me notifications that they updated the site. This turned out to be an easy project so I wanted to expand on this and create a script that would automatically check if any of our lotto numbers hit. Unfortunately I haven't been able to figure out how to get the data from the website. Here is one of my attempts from last night.
from bs4 import BeautifulSoup
import urllib.request
webpage = "http://www.masslottery.com/games/lottery/large-winningnumbers.html"
websource = urllib.request.urlopen(webpage)
soup = BeautifulSoup(websource.read(), "html.parser")
span = soup.find("span", {"id": "winning_num_0"})
print (span)
Output is here...
<span id="winning_num_0"></span>
The output listed above is also what I see if I "view source" with a web browser. When I "inspect Element" with the web browser I can see the winning numbers in the inspect element panel. Unfortunately I'm not even sure how/where the web browser is getting the data. is it loading from another page or a script in the background? I thought the following tutorial was going to help me but I wasn't able to get the data using similar commands.
Any help is appreciated. Thanks
Upvotes: 1
Views: 46566
Reputation: 51797
If you look closely at the source of the page (I just used curl
) you can see this block
<script type="text/javascript">
// <![CDATA[
var dataPath = '../../';
var json_filename = 'data/json/games/lottery/recent.json';
var games = new Array();
var sessions = new Array();
// ]]>
</script>
That recent.json
stuck out like a sore thumb (I actually missed the dataPath
part at first).
After giving that a try, I came up with this:
curl http://www.masslottery.com/data/json/games/lottery/recent.json
Which, as lari points out in the comments, is way easier than scraping HTML. This easy, in fact:
import json
import urllib.request
from pprint import pprint
websource = urllib.request.urlopen('http://www.masslottery.com/data/json/games/lottery/recent.json')
data = json.loads(websource.read().decode())
pprint(data)
data
is now a dict, and you can do whatever kind of dict-like things you'd like to do with it. And good luck ;)
Upvotes: 2