Harshil Doshi
Harshil Doshi

Reputation: 3592

Select all divs except ones with certain classes in BeautifulSoup

As discussed in this question one can easily get all divs with certain classes. But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.

For i.e.

classToIgnore = ["class1", "class2", "class3"]

Now want to get all divs that doesn't contains the classes mentioned above list. How can i achieve that?

Upvotes: 1

Views: 2059

Answers (2)

Bitto
Bitto

Reputation: 8225

Alternate solution

soup.find_all('div', class_=lambda x: x not in classToIgnore)

Example

from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))

Output

[<div class="c3"></div>, <div class="c4"></div>]

If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')

for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
    div.decompose()
print(soup.find_all('div'))

This might leave some extra spaces but you can strip that off easily later.

Upvotes: 2

Rajan
Rajan

Reputation: 1497

Using CSS selector, try this:

divs = soup.select("div:not('.class1, .class2, .class3')")

Reference

  1. Link 1
  2. Link 2

Upvotes: 5

Related Questions