Edward Porter
Edward Porter

Reputation: 145

Python - BeautifulSoup findParent by attribute

I'm hoping to use the findParent() method in BeautifulSoup to find a particular tag's parent that has an id attribute. For example, consider the following sample XML:

<monograph>
    <section id="1234">
        <head>Test Heading</head>
        <p>Here's a paragraph with some text in it.</p>
    </section>
</monograph>

Assuming I've matched something in the paragraph, I'd like to use findParent to indiscriminately find the first parent up the tree with an id attribute. Something like:

 for hit in monograph(text="paragraph with"):
     containername = hit.findParent(re.compile([A-Za-z]+), {id}).name

However, the preceding code doesn't return any hits.

Upvotes: 1

Views: 2219

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124558

Use id=True to match an element that has an id attribute, regardless of the value of the attribute:

hit.find_parent(id=True)

Inversely, using id=False would find the first parent element without an id attribute.

Note that you should really use the lower_case_with_underscores style for BeautifulSoup methods; findParent is the BeautifulSoup 3 spelling that has been deprecated.

Demo:

>>> from bs4 import BeautifulSoup
>>> sample = '''\
... <monograph>
...     <section id="1234">
...         <head>Test Heading</head>
...         <p>Here's a paragraph with some text in it.</p>
...     </section>
... </monograph>
... '''
>>> soup = BeautifulSoup(sample, 'xml')
>>> str(soup.p)
"<p>Here's a paragraph with some text in it.</p>"
>>> print(soup.p.find_parent(id=True).prettify())
<section id="1234">
 <head>
  Test Heading
 </head>
 <p>
  Here's a paragraph with some text in it.
 </p>
</section>

>>> print(soup.p.find_parent(id=False).prettify())
<monograph>
 <section id="1234">
  <head>
   Test Heading
  </head>
  <p>
   Here's a paragraph with some text in it.
  </p>
 </section>
</monograph>

Upvotes: 3

Related Questions