nicholas
nicholas

Reputation: 509

Scrape Table Data from Website

I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. So far, I have this:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string

But it isn't working to display the data.

Any ideas?

Upvotes: 1

Views: 918

Answers (1)

kirelagin
kirelagin

Reputation: 13606

First of all the class is StandardResultsGrid, not spad.

Second, you don't need the tbody thing. Simply use:

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

Also note, that since in the original page the row with header is included in tbody for some reason, you'll have to skip the first row, so

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

And note that some cells include tables in them, so you'll have to parse the contents of the tds carefully.

Upvotes: 5

Related Questions