Python requests is not extracting all elements

Question

I am trying to extract TR data from the following page: http://www.datasheetcatalog.com/catalog/p1342320.shtml

I am using requests and BeautifulSoup. However, I don't get all rows ( only 12 instead of 22 from second table). Does anybody have an explanation for this (provided that the rows are there when printing response.content.)?

Here is the code I am using :

from bs4 import BeautifulSoup
import requests

session = requests.Session()

url = 'http://www.datasheetcatalog.com/catalog/p1342320.shtml'
response = session.get(url)

soup = BeautifulSoup(response.content,"lxml")

trs=  soup.findAll('table')[8].findAll('tr')
print (len(trs))

Lyesgigs · Accepted Answer

After detailed examination of the html page i found that beautifulsoup stopped after hitting comments (). So the solution is to change the parser from "lxml" to "html5lib" :

soup = BeautifulSoup(response.content,"html5lib")

Python requests is not extracting all elements

Answers (2)

Related Questions