Reputation: 11164
I have a class in my soup element that is the description of a unit.
<div class="ats-description">
<p>Here is a paragraph</p>
<div>inner div</div>
<div>Another div</div>
<ul>
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
</ul>
</div>
I can easily grab this part with soup.select(".ats-description")[0]
.
Now I want to remove <div class="ats-description">
, only to keep all the inner tags (to retain text structure). How to do it?
soup.select(".ats-description")[0].getText()
gives me all the texts within, like this:
'\nHere is a paragraph\ninner div\nAnother div\n\nItem1\nItem2\nItem3\n\n\n'
But removes all the inner tags, so it's just unstructured text. I want to keep the tags as well.
Upvotes: 0
Views: 1528
Reputation: 6554
to get innerHTML, use method .decode_contents()
innerHTML = soup.select_one('.ats-description').decode_contents()
print(innerHTML)
Upvotes: 1
Reputation: 2619
Try this, match by tag in list in soup.find_all()
from bs4 import BeautifulSoup
html="""<div class="ats-description">
<p>Here is a paragraph</p>
<div>inner div</div>
<div>Another div</div>
<ul>
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
</ul>
</div>"""
soup = BeautifulSoup(html, 'lxml')
print(soup.select_one("div.ats-description").find_all(['p','div','ul']))
Upvotes: 0