sgrozz
sgrozz

Reputation: 25

Cannot find table using Python BeautifulSoup

I am trying to scrape the data from the table id=AWS from the following NOAA site, https://www.weather.gov/afc/alaskaObs, but when I try to find the table using '.find' my result comes up as none. I am able to return the parent div, but can't seem to access the table. Below is my code.

from bs4 import BeautifulSoup
from urllib2 import urlopen

# Get soup set up
html = urlopen('https://www.weather.gov/afc/alaskaObs').read()
soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table", {"id":"AWS"})


print soup

When I try to just find the parent div, "obDataDiv", it returns the following.

<div id="obDataDiv"> </div>

I'm pretty new to BeautifulSoup, is this an error? Any help is appreciated, thank you!

Upvotes: 0

Views: 753

Answers (2)

Matt O
Matt O

Reputation: 1346

urlopen will only give you the DOM that was downloaded from the server, not what it ends up being after running initial client-side scripts. In the case of your example site, the table is Javascript-generated after the page load. So you'll need to use PhantomJS, Selenium, etc to let the necessary client-side JS run first.

Upvotes: 1

Sam Chats
Sam Chats

Reputation: 2321

It seems the div you extract contains just one table. So why not do something like this:

soup = BeautifulSoup(html, 'lxml').find("div", {"id":"obDataDiv"}).find("table")

Upvotes: 0

Related Questions