Reputation: 93
I need to extract html tags with text from one tag on page. For example:
<html>
<body>
<div class="post">
text <p> text </p> text <a> text </a>
<span> text </span>
<div class="post">
another text <p> text </p>
</body>
</html>
I need html inside first <div class="post">
:
text <p> text </p> text <a> text </a>
<span> text </span>
with tags.
I can extract only text with xpath: "(//div[@class="post"])[1]/descendant-or-self::*[not(name()="script")]/text()"
result = text text text text text
I tried: "(//div[@class="post_body"])[1]/node()"
But I don't know how create string from this.
P.S. Or prompt another way, for example (BeautifulSoup) Please, help.
Upvotes: 1
Views: 300
Reputation: 61253
Use the find()
method to get the first div
.
from bs4 import BeautifulSoup
soup = BeautifulSoup("""<html>
<body>
<div class="post">
text <p> text </p> text <a> text </a>
<span> text </span></div>
<div class="post">
another text <p> text </p></div>
</body>
</html>""")
first_div_text = [child.strip() if isinstance(child, str) else str(child) for child in soup.find('div', attrs={'class': 'post'})]
print(''.join(first_div_text))
Output
text<p> text </p>text<a> text </a><span> text </span>
Upvotes: 1