Reputation: 1124
If I want to read entries in a table which follows the format:
<table cellspacing="0" cellpadding="4">
stuff
</table>
I'm using this as my current method:
pg = urllib2.urlopen(req).read()
page = BeautifulSoup(pg)
table = page.find('table', cellpadding = 4, cellspacing = 0)
My table
can't read the tag properly, what is the best way to do this?
Upvotes: 1
Views: 878
Reputation: 59974
I've tested this with both BeautifulSoup versions 3 and 4. Your code works with BS4, so you must be using version 3.
>>> from bs4 import BeautifulSoup as BS4 # Version 4
>>> from BeautifulSoup import BeautifulSoup as BS3 # Version 3
>>> bs3soup = BS3("""<table cellspacing="0" cellpadding="4">
...
... stuff
...
... </table>""")
>>> bs4soup = BS4("""<table cellspacing="0" cellpadding="4">
...
... stuff
...
... </table>""")
>>> bs3soup.find('table', cellpadding = 4, cellspacing = 0) # None
>>> bs4soup.find('table', cellpadding = 4, cellspacing = 0)
<table cellpadding="4" cellspacing="0">
stuff
</table>
So, if you want to continue with BS3, this should fix it:
>>> soup.find('table', cellpaddin="4", cellspacing="0") # Notice how the integers are now strings, like in the HTML.
However, you should be using version 4 (from bs4 import BeautifulSoup
).
Upvotes: 1