elbourne
elbourne

Reputation: 45

Wrap groupings of tags with Python BeautifulSoup

Is there a way to use BeautifulSoup to wrap an element tag around a grouping of tags?

I have a document like this:

<h1>Heading for Sec 1</h1>
    <p>some text sec 1</p>
    <p>some text sec 1</p>
    <p>some text sec 1</p>

<h1>Heading for Sec 2</h1>
    <p>some text sec 2</p>
    <p>some text sec 2</p>
    <p>some text sec 2</p>

<h1>Heading for Sec 3</h1>
    <p>some text sec 3</p>
    <p>some text sec 3</p>

I need to wrap each grouping with an tag. Each grouping begins with an tag. So the output would be:

<div>
<h1>Heading for Sec 1</h1>
    <p>some text sec 1</p>
    <p>some text sec 1</p>
    <p>some text sec 1</p>
</div>

<div>
<h1>Heading for Sec 2</h1>
    <p>some text sec 2</p>
    <p>some text sec 2</p>
    <p>some text sec 2</p>
</div>

<div>
<h1>Heading for Sec 3</h1>
    <p>some text sec 3</p>
    <p>some text sec 3</p>
</div>

Upvotes: 2

Views: 141

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

You can try:

from bs4 import BeautifulSoup

html_doc = """\
<h1>Heading for Sec 1</h1>
    <p>some text sec 1</p>
    <p>some text sec 1</p>
    <p>some text sec 1</p>

<h1>Heading for Sec 2</h1>
    <p>some text sec 2</p>
    <p>some text sec 2</p>
    <p>some text sec 2</p>

<h1>Heading for Sec 3</h1>
    <p>some text sec 3</p>
    <p>some text sec 3</p>"""

soup = BeautifulSoup(html_doc, "html.parser")

last_div = None
for tag in soup.select("h1, p"):   # or soup.find_all(["h1", "p"], recursive=False) if there are inner tags
    if tag.name == "h1":
        last_div = tag.wrap(soup.new_tag("div"))
        last_div.insert(0, "\n")
        last_div.append("\n")
        continue

    last_div.append(tag)
    last_div.append("\n")

# remove unnecessary spaces:
soup.smooth()
for t in soup.find_all(text=True, recursive=False):
    t.replace_with("\n")

print(soup)

Prints:

<div>
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</div>
<div>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</div>
<div>
<h1>Heading for Sec 3</h1>
<p>some text sec 3</p>
<p>some text sec 3</p>
</div>

Upvotes: 1

Related Questions