BeautifulSoup, getting more returns than expected with regex

Question

Using BeautifulSoup, I have the following line:

dimensions = SOUP.select(".specs__title > h4", text=re.compile(r'Dimensions'))

However, it's returning more than just the tags that have a text of 'Dimensions' as shown in these results:

[Dimensions
, Details
, Warranty / Certifications]

Am I using the regex incorrectly with the way SOUP works?

facelessuser · Accepted Answer

The select interface doesn't have a text keyword. Before we go further, the following is assuming you are using BeautifulSoup 4.7+.

If you'd like to filter by text, you might be able to do something like this:

dimensions = SOUP.select(".specs__title > h4:contains(Dimensions)")

More information on the :contains() pseudo-class implementation is available here: https://facelessuser.github.io/soupsieve/selectors/#:contains.

EDIT: To clarify, there is no way to incorporate regex directly into a select call currently. You would have to filter the elements after the fact to use regex. In the future there may be a way to use regex via some custom pseudo-class, but currently there is no such feature available in Soup Sieve (Beautiful Soup's select implementation in 4.7+).

BeautifulSoup, getting more returns than expected with regex

Answers (1)

Related Questions