Ahasanul Haque
Ahasanul Haque

Reputation: 11164

How to get all the tags (with content) under a certain class with BeautifulSoup?

I have a class in my soup element that is the description of a unit.

<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>

I can easily grab this part with soup.select(".ats-description")[0]. Now I want to remove <div class="ats-description">, only to keep all the inner tags (to retain text structure). How to do it?

soup.select(".ats-description")[0].getText() gives me all the texts within, like this:

'\nHere is a paragraph\ninner div\nAnother div\n\nItem1\nItem2\nItem3\n\n\n'

But removes all the inner tags, so it's just unstructured text. I want to keep the tags as well.

Upvotes: 0

Views: 1528

Answers (2)

uingtea
uingtea

Reputation: 6554

to get innerHTML, use method .decode_contents()

innerHTML = soup.select_one('.ats-description').decode_contents()
print(innerHTML)

Upvotes: 1

Samsul Islam
Samsul Islam

Reputation: 2619

Try this, match by tag in list in soup.find_all()

from bs4 import BeautifulSoup

html="""<div class="ats-description">
 <p>Here is a paragraph</p>
 <div>inner div</div>
 <div>Another div</div>
 <ul>
    <li>Item1</li>
    <li>Item2</li>
    <li>Item3</li>
 </ul>
</div>"""

soup = BeautifulSoup(html, 'lxml')
print(soup.select_one("div.ats-description").find_all(['p','div','ul']))

Upvotes: 0

Related Questions