Python/Beautiful Soup find particular heading output full div

Question

I'm attempting to parse a very extensive HTML document looks something like:



    part 1 


    insert text here 


   crazy table thing here 





    part 2 


    insert text here 


    crazy table thing here

Need to parse out the second div based on h2 having text "Part 2". Iwas able to break out all divs with:

divTag = soup.find("div", {"id": "reportsubsection"})

but didn't know how to dwindle it down from there. Other posts I found I was able to find the specific text "part 2 but I need to be able to output the whole DIV section it is contained in.

EDIT/UPDATE

Ok sorry but I'm still a little lost. Here is what I've got now. I feel like this should be so much simpler than I'm making it. Thanks again for all the help

divTag = soup.find("div", {"id": "reportsubsection"})

for reportsubsection in soup.select('div#reportsubsection #reportsubsection'):

    if not reportsubsection.findAll('h2', text=re.compile('Finding')):

        continue

print divTag

Martijn Pieters · Accepted Answer

You can always go back up after finding the right h2, or you can test all subsections:

for subsection in soup.select('div#reportsubsection #subsection'):
    if not subsection.find('h2', text=re.compile('part 2')):
        continue
    # do something with this subsection

This uses a CSS selector to locate all subsections.

Or, going back up with the .parent attribute:

for header in soup.find_all('h2', text=re.compile('part 2')):
    section = header.parent

The trick is to narrow down your search as early as possible; the second option has to find all h2 elements in the whole document, while the former narrows the search down quicker.

Python/Beautiful Soup find particular heading output full div

Answers (1)

Related Questions