Reputation: 1174
In beautifulsoup how can we exclude a tag within particular tag while using findAll.
Let us consider this example, I want to find all the <p>
tags in the html except the
tags within <tr>
tag.
soup.findAll(['p'])
The above code will fetch all the <p>
tags but I need to exculde the <p>
tags within <tr>
tag.
Upvotes: 4
Views: 1866
Reputation: 15578
You could use .select
. Examples:
Select all <p>
tags but exclude the <p>
tags within <tr>
tag.
soup.select('p:not(tr > p)')
Select all <p>
tags but exclude the <p>
tags that are a child of <tr>
tag
soup.select('p:not(tr p)')
Select all <p>
and <h2>
tags but exclude the <p>
tags that are a child of <tr>
tag
soup.select('p,h2:not(tr p)')
Upvotes: 3
Reputation: 5288
If I understand correctly, you want to select all p
that doesn't have tr
as parent in any level.
You can select all p
then filter the results using findParent
function. findParent
will return the first parent with a given tag name otherwise None
.
from bs4 import BeautifulSoup
html = """
<tr>
<p>1</p>
</tr>
<tr>
<td>
<p>2</p>
</td>
</tr>
<p>3</p>
<div>
<p>4</p>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
print([p for p in soup.findAll('p') if not p.findParent('tr')])
Upvotes: 2