BeautifulSoup not finding all

Question

I've got the following source code trying to parse a web page however, it appears that it does not find all the instances wherethe class "row dataraekker" appears in

cvr = 45963128
url = 'https://datacvr.virk.dk/data/visenhed?enhedstype=virksomhed&id=%s&soeg=%s' % (str(cvr), str(cvr))

rObject = requests.get(url)
html = rObject.content
soup = BeautifulSoup(html, 'html.parser')
registerHistoryTab = soup.find('div', class_="accordion ", id="accordion-Historisk")
dataRows = registerHistoryTab.find_all('div', class_='row dataraekker')
print len(dataRows)

registerHistoryTab holds 2 items with the following HTML, where multiple div's appear "out of nowhere" as that's not the case in the page's source code






Registreringshistorik









04.06.2015  Ændring i personkreds

CVR-nummer:45963128.
NAVN:UNILEVER DANMARK A/S.
Adresse: Ørestads Boulevard 73, 2300 København S.
Kommune: København.
Bestyrelse:
Fratrådte:
Jens Christian Voldmester, den 01.06.2015.
Direktion:
Fratrådte:
Jens Christian Voldmester, (adm. dir), den 01.06.2015.
Tiltrådte:
Henrico Drent, (adm. dir), Burgemeester Vogelslaan 63, 5062 KN, Oisterwijk, Holland, den 01.06.2015.
 



06.03.2015  Øvrige ændringer, Ændring i personkreds

CVR-nummer: 45963128
Navn og adresse: 

UNILEVER  DANMARK A/S

The issue appears at the find method because registerHistoryTab is not as it is when viewing the web page

Any help appreciated

alecxe · Accepted Answer

The issue appears at the find method because registerHistoryTab is not as it is when viewing the web page

Never expect your HTML returned by requests be the same as you see in the browser. When you deal with HTML parsing, work with what you've got inside the response and what you see in the browser.

Note that in this case, just switching the parser from html.parser to lxml solves the problem:

soup = BeautifulSoup(html, 'lxml')

Now I see 64 printed instead of 2.

Note that this requires lxml to be installed: pip install --upgrade lxml.

Also see:

Differences between parsers

BeautifulSoup not finding all

Answers (1)

Related Questions