Reputation: 29
I have this .html code:
<div id="content">
<ul id="tree">
<li xmlns="" class="level top failed open">
<span><em class="time">
<div class="time">1.89 s</div>
</em>I need to get this text</span>
I need to get only the text that is outside all of the other tags (text is: I need to get this text).
I was trying to use this piece of code:
path = document.find('li', class_='level top').find_all("em")[-1].next_sibling
if not path:
path = document.find('li', class_='level top failed open').find_all("em")[-1].next_sibling
return path
But I get an error: AttributeError: 'NoneType' object has no attribute 'find_all'.
Does anybody know how to access this text?
Thank you!
Upvotes: 0
Views: 103
Reputation: 16189
You can apply .contents
and it will generate a list of output and the desired one is [-1]
html = '''
<div id="content">
<ul id="tree">
<li class="level top failed open" xmlns="">
<span>
<em class="time">
<div class="time">
1.89 s
</div>
</em>
I need to get this text
</span>
</li>
</ul>
</div>
'''
from bs4 import BeautifulSoup
soup=BeautifulSoup(html,'html.parser')
#print(soup.prettify())
txt= soup.select_one('#tree > li > span').contents[-1]
print(txt)
Output:
I need to get this text
Upvotes: 1
Reputation: 571
Try to use this method:
.find_all("span", text=True)
Because the text is in the span element
Upvotes: 0