Reputation: 55
I've been trying to scrape a HTML table with Python and I can't get it to print for some reason, bear with me since I've just started using Python (2 days in.) and I've barely scratched the surface, this is also my first Stack Overflow post so I'll try to make it as descriptive as possible.
Pretty sure this question might've been asked before, and I'm sorry in that case.
Anyways! Here's the code:
import urllibs2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen ('http://premierleague.com/en-gb/matchday/league-table.html').read())
for row in soup('table',{'class':'leagueTable'})[0].tbody('tr'):
tds=row('td')
http://premierleague.com/en-gb/matchday/league-table.html
I'm weak at Python and I'm not sure the code is right for this type of scrape, but from what I can understand myself it's the print I can't get to work. I tried different ways of printing but can't get it to work.
Upvotes: 2
Views: 370
Reputation: 474201
Make it simpler - use a CSS selector to get to the desired rows - tr
elements having club-row
class located inside the table
having leagueTable
class. For each row get the text of all the cells. Working example:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.premierleague.com/en-gb/matchday/league-table.html'))
for row in soup.select("table.leagueTable tr.club-row"):
cells = [cell.get_text(strip=True) for cell in row.find_all('td')]
print cells
Prints:
[u'1', u'', u'(1)', u'Manchester City', u'5', u'5', u'0', u'0', u'11', u'0', u'11', u'15']
[u'2', u'', u'(2)', u'Leicester City', u'5', u'3', u'2', u'0', u'11', u'7', u'4', u'11']
[u'3', u'', u'(3)', u'Manchester United', u'5', u'3', u'1', u'1', u'6', u'3', u'3', u'10']
[u'4', u'', u'(4)', u'Arsenal', u'5', u'3', u'1', u'1', u'5', u'3', u'2', u'10']
[u'5', u'', u'(10)', u'West Ham United', u'5', u'3', u'0', u'2', u'11', u'6', u'5', u'9']
[u'6', u'', u'(5)', u'Crystal Palace', u'5', u'3', u'0', u'2', u'8', u'6', u'2', u'9']
[u'7', u'', u'(6)', u'Everton', u'5', u'2', u'2', u'1', u'8', u'5', u'3', u'8']
[u'8', u'', u'(7)', u'Swansea City', u'5', u'2', u'2', u'1', u'7', u'5', u'2', u'8']
[u'9', u'', u'(8)', u'Norwich City', u'5', u'2', u'1', u'2', u'8', u'9', u'-1', u'7']
[u'10', u'', u'(9)', u'Liverpool', u'5', u'2', u'1', u'2', u'3', u'6', u'-3', u'7']
[u'11', u'', u'(11)', u'Southampton', u'5', u'1', u'3', u'1', u'5', u'5', u'0', u'6']
[u'12', u'', u'(12)', u'Tottenham Hotspur', u'5', u'1', u'3', u'1', u'4', u'4', u'0', u'6']
[u'13', u'', u'(13)', u'Watford', u'5', u'1', u'3', u'1', u'3', u'4', u'-1', u'6']
[u'14', u'', u'(14)', u'West Bromwich Albion', u'5', u'1', u'2', u'2', u'3', u'6', u'-3', u'5']
[u'15', u'', u'(15)', u'Aston Villa', u'5', u'1', u'1', u'3', u'6', u'8', u'-2', u'4']
[u'16', u'', u'(16)', u'Bournemouth', u'5', u'1', u'1', u'3', u'6', u'9', u'-3', u'4']
[u'17', u'', u'(17)', u'Chelsea', u'5', u'1', u'1', u'3', u'7', u'12', u'-5', u'4']
[u'18', u'', u'(19)', u'Stoke City', u'5', u'0', u'2', u'3', u'3', u'7', u'-4', u'2']
[u'19', u'', u'(20)', u'Sunderland', u'5', u'0', u'2', u'3', u'6', u'11', u'-5', u'2']
[u'20', u'', u'(18)', u'Newcastle United', u'5', u'0', u'2', u'3', u'2', u'7', u'-5', u'2']
And now we can clearly see - that's a terrible start for Chelsea.
Upvotes: 1