Sebastian
Sebastian

Reputation: 967

Why does find_all fail to find a `div` element that is clearly there?

I want to find the text associated to a div element from a webpage parsed with beautifulsoup.

print(searchResult)

<div id="results-from-CIDR"><a href="javascript:prefixContribsToggleAll();" id="prefixcontribs-tog">toggle all</a><span id="prefixcontribs-prog">Searching.</span> No changes were found for this wildcard/CIDR range.</div>

print(type(searchResult))

<class 'bs4.element.Tag'>

print(searchResult.find_all("div"))

[]

print(searchResult.find_all("div", attrs={"id":"results-from-CIDR"}))

[]

There is clearly a div there. Why doesn't it find it?

Upvotes: 3

Views: 488

Answers (1)

J. Taylor
J. Taylor

Reputation: 4855

If you are just looking for the plain/visible text of the div, without any of the markup, you can access this text through the searchResult.text attribute.

The Tag.find_all() method only searches for descendants of Tag whose name matches the given argument. So in your case, it is returning an empty list because there are no descendant <divs>. The only descendants of the example Tag that you shared are an <a> tag, a <span>, and several instances of NavigableString (the bs4 object used to represent visible text in the DOM tree). If you wanted to use find_all() to return the <div> in your example, you'd have to call it from the parent Tag (or rather, from any element that the target <div> is a descendant of).

For instance, if you do:

from bs4 import BeautifulSoup as Soup
soup = Soup('<html><body><div id="results-from-CIDR"><a href="javascript:prefixContribsToggleAll();" id="prefixcontribs-tog">toggle all</a><span id="prefixcontribs-prog">Searching.</span> No changes were found for this wildcard/CIDR range.</div></body></html>')
soup.findall('div') 

... then the element will be returned, because it is a descendant of soup (the html docroot).

But again, if you're just trying to extract the text, use the .text attribute, which gives the visible text for a given tag, and any of its descendants.

Upvotes: 5

Related Questions