Max Kim
Max Kim

Reputation: 1124

Parse table with BeautifulSoup Python

If I want to read entries in a table which follows the format:

<table cellspacing="0" cellpadding="4">

stuff

</table>

I'm using this as my current method:

pg = urllib2.urlopen(req).read()
page = BeautifulSoup(pg)
table = page.find('table', cellpadding = 4, cellspacing = 0)

My table can't read the tag properly, what is the best way to do this?

Upvotes: 1

Views: 878

Answers (1)

TerryA
TerryA

Reputation: 59974

I've tested this with both BeautifulSoup versions 3 and 4. Your code works with BS4, so you must be using version 3.

>>> from bs4 import BeautifulSoup as BS4 # Version 4
>>> from BeautifulSoup import BeautifulSoup as BS3 # Version 3
>>> bs3soup = BS3("""<table cellspacing="0" cellpadding="4">
... 
... stuff
... 
... </table>""")
>>> bs4soup = BS4("""<table cellspacing="0" cellpadding="4">
... 
... stuff
... 
... </table>""")
>>> bs3soup.find('table', cellpadding = 4, cellspacing = 0) # None
>>> bs4soup.find('table', cellpadding = 4, cellspacing = 0)
<table cellpadding="4" cellspacing="0">

stuff

</table>

So, if you want to continue with BS3, this should fix it:

>>> soup.find('table', cellpaddin="4", cellspacing="0") # Notice how the integers are now strings, like in the HTML.

However, you should be using version 4 (from bs4 import BeautifulSoup).

Upvotes: 1

Related Questions