Karthikeyan KR
Karthikeyan KR

Reputation: 1174

BeautifulSoup exclude a tag in findAll

In beautifulsoup how can we exclude a tag within particular tag while using findAll.

Let us consider this example, I want to find all the <p> tags in the html except the

tags within <tr> tag.

soup.findAll(['p'])

The above code will fetch all the <p> tags but I need to exculde the <p> tags within <tr> tag.

Upvotes: 4

Views: 1866

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15578

You could use .select. Examples:
Select all <p> tags but exclude the <p> tags within <tr> tag.

soup.select('p:not(tr > p)')

Select all <p>tags but exclude the <p>tags that are a child of <tr> tag

soup.select('p:not(tr p)')

Select all <p> and <h2>tags but exclude the <p>tags that are a child of <tr> tag

soup.select('p,h2:not(tr p)')

Upvotes: 3

bertdida
bertdida

Reputation: 5288

If I understand correctly, you want to select all p that doesn't have tr as parent in any level.

You can select all p then filter the results using findParent function. findParent will return the first parent with a given tag name otherwise None.

from bs4 import BeautifulSoup

html = """
  <tr>
    <p>1</p>
  </tr>
  
  <tr>
    <td>
      <p>2</p>
    </td>
  </tr>
  
  <p>3</p>
  
  <div>
    <p>4</p>
  </div>
"""

soup = BeautifulSoup(html, "html.parser")
print([p for p in soup.findAll('p') if not p.findParent('tr')])

Upvotes: 2

Related Questions