蔡忠振
蔡忠振

Reputation: 23

How to get all child elements using BeautifulSoup framework

I have such an HTML document and I want to get the content inside the section

<body>
  <section class="post-content">
    <h1>title</h1>
    <div>balabala</div>
  </section>
<body>

When I use the following code

soup.find_all("section", {"class": "post-content"})

I get

<section class="post-content">
  <h1>title</h1>
  <div>balabala</div>
</section>

But what I want is what is inside the section, what should I do?

Upvotes: 2

Views: 2113

Answers (1)

Michael M.
Michael M.

Reputation: 11070

You can use the .findChildren() method and a list comphrension:

import bs4

soup = bs4.BeautifulSoup("""
<body>
    <section class="post-content">
        <h1>title</h1>
        <div>part one</div>
    </section>
    <section class="post-content">
        <h1>title2</h1>
        <div>part two</div>
    </section>
<body>
                         """, 'html.parser')

els = soup.find_all("section", {"class": "post-content"})
els = [list(el.findChildren()) for el in els]
print(els)  # => [[<h1>title</h1>, <div>part one</div>], [<h1>title2</h1>, <div>part two</div>]]

The soup.find_all() call returns a list of elements and the list comprehension loops over every element and splitting it into a list of its children. el.findChildren() returns an iterator, so it need to be collected into a list with list().

Upvotes: 1

Related Questions