dwightjl
dwightjl

Reputation: 2324

BeautifulSoup find_all() to find elements that have one of several acceptable attribute values

Primary question

I know how to use find_all() to retrieve elements that have an attribute with a specific value, but I can't find any examples of how to retrieve elements that have attributes with one of several acceptable values. In my case I'm working with DITA XML and I want to retrieve topicref elements where the scope attribute is one of the following:

I wrote a custom function that works, but there must be a smarter way to do this with the functions that are already present in BeautifulSoup. Here is my code:

from bs4 import BeautifulSoup

with open("./dita_source/envvariables.ditamap","r") as file:
    doc = BeautifulSoup(file)
    file.close()


def isLocal(element):
    if (element.name == "topicref"):
        if (not element.has_attr("scope") or element["scope"] == "local" or element["scope"] == "peer"):
            return True;
    return False;

topicrefs = doc.find_all(isLocal)

Secondary question

Is there a way to use find_all() with both its standard filters as well as a custom function? I tried doc.find_all("topicref", isLocal), but that didn't work. I had to add the extra if (element.name == "topicref"): statement to my custom function instead.

Upvotes: 2

Views: 1301

Answers (2)

falsetru
falsetru

Reputation: 368904

Specify topicref as the first argument (name) and pass a function for scope keyword argument:

def isLocal(scope):
    return scope in (None, "local", "peer")

topicrefs = soup.find_all('topicref', scope=isLocal)

Or using lambda:

topicref = soup.find_all(
    'topicref',
    scope=lambda scope: scope in (None, "local", "peer")
)

Upvotes: 0

schesis
schesis

Reputation: 59118

You can supply a list as the value of an attribute parameter to find_all(), and it will return elements where the attribute matches any of the items in that list:

>>> soup.find_all(scope=["row", "col"])
[
    <th scope="col">US $</th>,
    <th scope="col">Euro</th>,
    <th scope="row">Mon – Fri</th>,
    <th scope="row">Sat – Sun</th>,
]

... but there's no way to specify "attribute doesn't exist at all" in that list (neither None nor an empty string work). So for that, you do need a function.

Upvotes: 1

Related Questions