Delgan
Delgan

Reputation: 19697

Get an element and its parent ONLY, using BeautifulSoup in Python

I would like to use the find_all function of BeautifulSoup to retrieve all <li> tag but also their parent.

<div name="div1">
    <li>Test 1</li>
    <li>Test 2</li>
</div>

If I try with this code:

tags = soup.find_all("li")
print tags[0].parent

This will print:

<div name="div1">
    <li>Test 1</li>
    <li>Test 2</li>
</div>

Because the parent contains the two <li> tags.

What I expect is:

<div name="div1">
    <li>Test 1</li>
</div>

How to solve this issue please?

Upvotes: 0

Views: 2397

Answers (1)

ofrommel
ofrommel

Reputation: 2177

You can achieve what you supposedly want to by replicating the parent for each list element and wrapping the element in it:

from bs4 import BeautifulSoup

txt = """<div name="div1">
        <li>Test 1</li>
        <li>Test 2</li>
        </div>"""

def clone(soup, tag):
   newtag = soup.new_tag(tag.name)
   for attr in tag.attrs:
      newtag[attr] = tag[attr]
   return newtag

soup = BeautifulSoup(txt)
tags = soup.find_all("li")
for tag in tags:
   print tag.wrap(clone(soup, tag.parent))

Upvotes: 2

Related Questions