BeautifulSoup not finding tr id

Question

I've run this web scraping exercise using the requests and BeautifulSoup module in python 2.7.12. My problem is that I can't seem to get the soup object to return a specific tr based on the id, as well as a few other html elements with id that I've picked at random including the ones in the below print statements. Any idea why that's not working? Any help would be greatly appreciated.

import requests
from bs4 import BeautifulSoup as bs

head= {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html',}

r = requests.get('http://www.iii.co.uk/investment/detail?code=cotn:LSE:SEE&display=discussion', headers=head)

r_text = r.text
soup = bs(r_text, "html.parser")

print soup.find("tr",id="disc1-12056888")
print soup.find('table', id='discussion-list')

宏杰李 · Accepted Answer

I believe html.parser is unstable is python2, use lxml or html5lib

soup = bs(r_text, "lxml")

This quote is from Document:

If you can, I recommend you install and use lxml for speed. If you’re using a version of Python 2 earlier than 2.7.3, or a version of Python 3 earlier than 3.2.2, it’s essential that you install lxml or html5lib–Python’s built-in HTML parser is just not very good in older versions.

BeautifulSoup not finding tr id

Answers (2)

Related Questions