Javier Galindo
Javier Galindo

Reputation: 39

Why is this BeautifulSoup code outputting "None"?

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://espnfc.com/tables/_/league/esp.1/spanish-la-liga?cc=5901"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div id', attrs={'class': 'content'})

rows = soup.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        print text,  
    print

and I get: (note this is only a little bit of what I was looking for, which are standings for a soccer league)

  Overall None Home None Away None  
POS None TEAM P W D L F A None W D L F A None W D L F A None GD Pts
1 
Barcelona 38 32 4 2 115 40 None 18 1 0 63 15 None 14 3 

My question is, Why is there a "None" after every word? Is there a way I can make it stop doing that?

Upvotes: 1

Views: 191

Answers (2)

Serial
Serial

Reputation: 8043

The None happen when an element has multiple children like it says in The Docs

the easiest way to get rid of theNone is like this:

for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        if text is not None:
            print text,  
    print  

that will check if text = None and if it is it wont print it

Upvotes: 0

TerryA
TerryA

Reputation: 59984

If you notice on the website, there are spaces between some info, and this is included in each td.

You may notice that all the spaces have a width. So, you can do this:

cols = tr.findAll('td', width=None)

If you decide to swap to BeautifulSoup 4 at any stage, use:

cols = tr.findAll('td', width=False)

Upvotes: 1

Related Questions