Python Web Scraping table returns None

Question

I'm trying to scrape the temperature elements of a table from www.intellicast.com

soup =  BeautifulSoup(urllib2.urlopen('http://www.intellicast.com/Local/History.aspx?location=USTX0057').read())
for row in soup('table',{'id':'dailyClimate'})[0].tbody('tr'):
  tds=row
  print tds

The result: TypeErrorL 'NoneType' object is not callable

When looking the the page source code i can see

...


So I know there is a tbody as well as a tr element.

If I change .tbody('tr') for .tbody('td') I still get an error so I'm assuming I'm assuming the error is somewhere in calling tbody.

Martijn Pieters · Accepted Answer

Your browser inserts a element, but the actual source doesn't have that element:

See Why do browsers insert tbody element into table elements?
You could use the html5lib parser instead (using BeautifulSoup(source, 'html5lib')), which would also insert the element. However, you don't need to search for it, just go straight to the 
 rows:
for row in soup.find('table', id='dailyClimate').find_all('tr'):
or using a CSS selector:
for row in soup.select('table#dailyClimate tr'):
You'd normally only select the tbody element if there perhaps were more than one or there was a thead or tfooter element you wanted to exclude.

  
    Date
    Average
Low
    Average
High
    Record
Low
    Record
High
    Average
Precipitation
    Average
Snow

Python Web Scraping table returns None

Answers (1)

Related Questions