dxhans5
dxhans5

Reputation: 127

BeautifulSoup, getting more returns than expected with regex

Using BeautifulSoup, I have the following line:

dimensions = SOUP.select(".specs__title > h4", text=re.compile(r'Dimensions'))

However, it's returning more than just the tags that have a text of 'Dimensions' as shown in these results:

[<h4>Dimensions</h4>, <h4>Details</h4>, <h4>Warranty / Certifications</h4>]

Am I using the regex incorrectly with the way SOUP works?

Upvotes: 0

Views: 73

Answers (1)

facelessuser
facelessuser

Reputation: 1734

The select interface doesn't have a text keyword. Before we go further, the following is assuming you are using BeautifulSoup 4.7+.

If you'd like to filter by text, you might be able to do something like this:

dimensions = SOUP.select(".specs__title > h4:contains(Dimensions)")

More information on the :contains() pseudo-class implementation is available here: https://facelessuser.github.io/soupsieve/selectors/#:contains.

EDIT: To clarify, there is no way to incorporate regex directly into a select call currently. You would have to filter the elements after the fact to use regex. In the future there may be a way to use regex via some custom pseudo-class, but currently there is no such feature available in Soup Sieve (Beautiful Soup's select implementation in 4.7+).

Upvotes: 2

Related Questions