Reputation: 967
I want to find the text associated to a div
element from a webpage parsed with beautifulsoup.
print(searchResult)
<div id="results-from-CIDR"><a href="javascript:prefixContribsToggleAll();" id="prefixcontribs-tog">toggle all</a><span id="prefixcontribs-prog">Searching.</span> No changes were found for this wildcard/CIDR range.</div>
print(type(searchResult))
<class 'bs4.element.Tag'>
print(searchResult.find_all("div"))
[]
print(searchResult.find_all("div", attrs={"id":"results-from-CIDR"}))
[]
There is clearly a div there. Why doesn't it find it?
Upvotes: 3
Views: 488
Reputation: 4855
If you are just looking for the plain/visible text of the div, without any of the markup, you can access this text through the searchResult.text
attribute.
The Tag.find_all()
method only searches for descendants of Tag
whose name
matches the given argument. So in your case, it is returning an empty list because there are no descendant <divs>
. The only descendants of the example Tag
that you shared are an <a>
tag, a <span>
, and several instances of NavigableString
(the bs4
object used to represent visible text in the DOM tree). If you wanted to use find_all()
to return the <div>
in your example, you'd have to call it from the parent Tag
(or rather, from any element that the target <div>
is a descendant of).
For instance, if you do:
from bs4 import BeautifulSoup as Soup
soup = Soup('<html><body><div id="results-from-CIDR"><a href="javascript:prefixContribsToggleAll();" id="prefixcontribs-tog">toggle all</a><span id="prefixcontribs-prog">Searching.</span> No changes were found for this wildcard/CIDR range.</div></body></html>')
soup.findall('div')
... then the element will be returned, because it is a descendant of soup
(the html docroot).
But again, if you're just trying to extract the text, use the .text
attribute, which gives the visible text for a given tag, and any of its descendants.
Upvotes: 5