Bs4 gets only a partial set of the html tags

Question

I am scraping html that contains the .... (some 5K more lines)

(Interesting discrepancy between the source code of the page, where the last line of the option list has just the attribute selected, and the console where the same attribute is shown as selected="")

Matthew Gaiser · Accepted Answer

We verified that the code works for the purposes of identifying the selected option, even with the high quantity of options, which were entered as a string for testing purposes.

from bs4 import BeautifulSoup

content =''' 
 '''
soup_1 = BeautifulSoup(content, 'lxml')
title = soup_1.find('select', {'id': 'id_document'})
title2 = title.findAll('option')
for tit in title2:
    if tit.has_attr('selected'):
        print("found")
        print(tit.getText())

In our chat, we determined that the problem is likely in scraping the tags from the website rather than the processing of the data. Anyone else who stumbles upon this should check that their request.content or content actually contains the information which they wish to scrape.

Bs4 gets only a partial set of the html tags

Answers (1)

Related Questions