Reputation: 721
I'm trying to extract the text inside from the following html structure:
<div class="account-age">
<label></label>
<div>
<div>
<span>Text to extract</span>
</div>
</div>
</div>
I have the following Beautiful Soup code to do it:
from bs4 import BeautifulSoup as bs
soup = bs(html, "lxml")
div = soup.find("div", {"class": "account-age"})
span = div.children[1].children[0].children[0]
text = span.get_text()
Unfortunately, Beautiful Soup is throwing the error: 'list_iterator' object is not subscriptable. How can I fix this to extract the text I need?
Upvotes: 3
Views: 7494
Reputation: 3601
Try this:
from bs4 import BeautifulSoup as bs
html ='''<div class="account-age">
<label></label>
<div>
<div>
<span>Text to extract</span>
</div>
</div>
</div>'''
soup = bs(html, 'html.parser')
div = soup.find("div", {"class": "account-age"})
span = div.find('span')
text = span.get_text()
print(text)
Result:
Text to extract
Upvotes: 0
Reputation: 2824
The property children
is an generator. As the error says, it is not subscriptable. To get a list, use contents
instead:
div.contents[1].contents[0].contents[0]
See documentation.
Upvotes: 1
Reputation: 46759
First locate the div
, and then access the span
text using an attribute as follows:
from bs4 import BeautifulSoup as bs
html = """<div class="account-age">
<label></label>
<div>
<div>
<span>Text to extract</span>
</div>
</div>
</div>"""
soup = bs(html, "lxml")
div = soup.find('div', class_='account-age')
print(div.span.text)
This would display:
Text to extract
Upvotes: 0
Reputation: 214987
You might do this by directly chaining the tags from the root div
:
div.div.div.span.get_text()
# u'Text to extract'
Upvotes: 2