Reputation: 3592
As discussed in this question one can easily get all div
s with certain classes. But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.
For i.e.
classToIgnore = ["class1", "class2", "class3"]
Now want to get all divs that doesn't contains the classes mentioned above list. How can i achieve that?
Upvotes: 1
Views: 2059
Reputation: 8225
Alternate solution
soup.find_all('div', class_=lambda x: x not in classToIgnore)
Example
from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))
Output
[<div class="c3"></div>, <div class="c4"></div>]
If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')
for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
div.decompose()
print(soup.find_all('div'))
This might leave some extra spaces but you can strip that off easily later.
Upvotes: 2