Reputation: 509
I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. So far, I have this:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())
for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
tds = row('td')
print tds[0].string, tds[1].string
But it isn't working to display the data.
Any ideas?
Upvotes: 1
Views: 918
Reputation: 13606
First of all the class is StandardResultsGrid
, not spad
.
Second, you don't need the tbody
thing. Simply use:
for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):
Also note, that since in the
original page the row with header is included in tbody
for some reason, you'll have to skip the first row, so
for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]
And note that some cells include table
s in them, so you'll have to parse the contents of the td
s carefully.
Upvotes: 5