Dim
Dim

Reputation: 93

xpath - how extract html from one tag?

I need to extract html tags with text from one tag on page. For example:

<html>
 <body>
  <div class="post">
   text <p> text </p> text <a> text </a>
   <span> text </span>
  <div class="post">
   another text <p> text </p>
 </body>
</html>

I need html inside first <div class="post"> :

text <p> text </p> text <a> text </a>
   <span> text </span>

with tags.

I can extract only text with xpath: "(//div[@class="post"])[1]/descendant-or-self::*[not(name()="script")]/text()" result = text text text text text

I tried: "(//div[@class="post_body"])[1]/node()" But I don't know how create string from this.

P.S. Or prompt another way, for example (BeautifulSoup) Please, help.

Upvotes: 1

Views: 300

Answers (1)

Sede
Sede

Reputation: 61253

Use the find() method to get the first div.

from bs4 import BeautifulSoup   
soup = BeautifulSoup("""<html>
     <body>
      <div class="post">
       text <p> text </p> text <a> text </a>
       <span> text </span></div>
      <div class="post">
       another text <p> text </p></div>
     </body>
    </html>""")

first_div_text = [child.strip() if isinstance(child, str) else str(child)  for child in soup.find('div', attrs={'class': 'post'})]
print(''.join(first_div_text))

Output

text<p> text </p>text<a> text </a><span> text </span> 

Upvotes: 1

Related Questions