Parse table with BeautifulSoup Python

Question

If I want to read entries in a table which follows the format:



stuff

I'm using this as my current method:

pg = urllib2.urlopen(req).read()
page = BeautifulSoup(pg)
table = page.find('table', cellpadding = 4, cellspacing = 0)

My table can't read the tag properly, what is the best way to do this?

TerryA · Accepted Answer

I've tested this with both BeautifulSoup versions 3 and 4. Your code works with BS4, so you must be using version 3.

>>> from bs4 import BeautifulSoup as BS4 # Version 4
>>> from BeautifulSoup import BeautifulSoup as BS3 # Version 3
>>> bs3soup = BS3("""
... 
... stuff
... 
... """)
>>> bs4soup = BS4("""
... 
... stuff
... 
... """)
>>> bs3soup.find('table', cellpadding = 4, cellspacing = 0) # None
>>> bs4soup.find('table', cellpadding = 4, cellspacing = 0)


stuff

So, if you want to continue with BS3, this should fix it:

>>> soup.find('table', cellpaddin="4", cellspacing="0") # Notice how the integers are now strings, like in the HTML.

However, you should be using version 4 (from bs4 import BeautifulSoup).

Parse table with BeautifulSoup Python

Answers (1)

Related Questions