Reputation: 226
xmlstring="<a> <b> <c> Hello </c> </b> </a>"
I want to extract all the content "inside" <b> </b>
tags,
for this I used
content = "".join(xmlstring)
bs_content = bs(content, "lxml")
for b_text in bs_content.find_all("b"):
inside_text = b_text.get_text()
but inside_text
is Hello
instead of <c> Hello </c>
How do I write code to get <c> Hello </c>
instead?
Upvotes: 1
Views: 304
Reputation: 4472
You can use for that the children method and extract the second element since when calling find_all("b")
it's returning the <b> <c> Hello </c> </b>
item.
xmlstring="<a> <b> <c> Hello </c> </b> </a>"
content = "".join(xmlstring)
bs_content = bs(content, "lxml")
for b_text in bs_content.find_all("b"):
print(" ".join([str(i) for i in b_text.children if i != " "]))
Ouput
<c> Hello </c>
Upvotes: 1
Reputation: 195438
from bs4 import BeautifulSoup
xmlstring="<a> <b> <c> Hello </c> </b> </a>"
soup = BeautifulSoup(xmlstring, 'lxml')
print( ''.join(str(c) for c in soup.select_one('b').contents) )
Prints:
<c> Hello </c>
Upvotes: 2