Reputation: 39
import urllib2
from BeautifulSoup import BeautifulSoup
contenturl = "http://espnfc.com/tables/_/league/esp.1/spanish-la-liga?cc=5901"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())
table = soup.find('div id', attrs={'class': 'content'})
rows = soup.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = td.find(text=True)
print text,
print
and I get: (note this is only a little bit of what I was looking for, which are standings for a soccer league)
Overall None Home None Away None
POS None TEAM P W D L F A None W D L F A None W D L F A None GD Pts
1
Barcelona 38 32 4 2 115 40 None 18 1 0 63 15 None 14 3
My question is, Why is there a "None" after every word? Is there a way I can make it stop doing that?
Upvotes: 1
Views: 191
Reputation: 8043
The None happen when an element has multiple children like it says in The Docs
the easiest way to get rid of theNone
is like this:
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = td.find(text=True)
if text is not None:
print text,
print
that will check if text = None
and if it is it wont print it
Upvotes: 0
Reputation: 59984
If you notice on the website, there are spaces between some info, and this is included in each td.
You may notice that all the spaces have a width. So, you can do this:
cols = tr.findAll('td', width=None)
If you decide to swap to BeautifulSoup 4 at any stage, use:
cols = tr.findAll('td', width=False)
Upvotes: 1