Reputation: 25
I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code.
from bs4 import BeautifulSoup
from urllib2 import urlopen
# Get soup set up
html = urlopen('https://www.weather.gov/afc/alaskaObs').read()
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table", {"id":"AWS"})
print soup
When I try to just find the parent div, "obDataDiv", it returns the following.
<div id="obDataDiv">Â </div>
I'm pretty new to BeautifulSoup, is this an error? Any help is appreciated, thank you!
Upvotes: 0
Views: 753
Reputation: 1346
urlopen will only give you the DOM that was downloaded from the server, not what it ends up being after running initial client-side scripts. In the case of your example site, the table is Javascript-generated after the page load. So you'll need to use PhantomJS, Selenium, etc to let the necessary client-side JS run first.
Upvotes: 1
Reputation: 2321
It seems the div
you extract contains just one table. So why not do something like this:
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table")
Upvotes: 0