xxx374562
xxx374562

Reputation: 226

How to get all nested tags and text in an xml, inside a particular tag?

xmlstring="<a> <b> <c> Hello </c> </b> </a>"

I want to extract all the content "inside" <b> </b> tags,

for this I used

  content = "".join(xmlstring)
  bs_content = bs(content, "lxml")

  for b_text in bs_content.find_all("b"):
      inside_text = b_text.get_text()

but inside_text is Hello instead of <c> Hello </c>

How do I write code to get <c> Hello </c> instead?

Upvotes: 1

Views: 304

Answers (2)

Leo Arad
Leo Arad

Reputation: 4472

You can use for that the children method and extract the second element since when calling find_all("b") it's returning the <b> <c> Hello </c> </b> item.

xmlstring="<a> <b> <c> Hello </c> </b> </a>"
content = "".join(xmlstring)
bs_content = bs(content, "lxml")
for b_text in bs_content.find_all("b"):
    print(" ".join([str(i) for i in b_text.children if i != " "]))

Ouput

<c> Hello </c>

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

from bs4 import BeautifulSoup


xmlstring="<a> <b> <c> Hello </c> </b> </a>"
soup = BeautifulSoup(xmlstring, 'lxml')

print( ''.join(str(c) for c in soup.select_one('b').contents) )

Prints:

 <c> Hello </c> 

Upvotes: 2

Related Questions