Reputation: 6505
Im using soup.findAll('table') to try to find the table in an html file, but it will not appear. The table indeed exists in the file, and with regex Im able to locate it this way:
import sys
import urllib2
from bs4 import BeautifulSoup
import re
webpage = open(r'd:\samplefile.html', 'r').read()
soup = BeautifulSoup(webpage)
print re.findall("TABLE",webpage) #works, prints ['TABLE','TABLE']
print soup.findAll("TABLE") # prints an empty list []
I know I am correctly generating the soup since when I do:
print [tag.name for tag in soup.findAll(align=None)]
It will correctly print tags that it finds. I already tried also with different ways to write "TABLE" like "table", "Table", etc. Also, if I open the file and edit it with a text editor, it has "TABLE" on it.
Why beautifulsoup doesnt find the table??
Upvotes: 1
Views: 1310
Reputation: 32370
findall
does not return all the expected tags, or it returns none at all, even though the user knows that the tag exists in the markupBeautifulSoup
constructor## BEFORE soup = BeautifulSoup(webpage) ## AFTER soup = BeautifulSoup(webpage, "html5lib")
Upvotes: 1