Reputation: 127
Using BeautifulSoup, I have the following line:
dimensions = SOUP.select(".specs__title > h4", text=re.compile(r'Dimensions'))
However, it's returning more than just the tags that have a text of 'Dimensions' as shown in these results:
[<h4>Dimensions</h4>, <h4>Details</h4>, <h4>Warranty / Certifications</h4>]
Am I using the regex incorrectly with the way SOUP works?
Upvotes: 0
Views: 73
Reputation: 1734
The select
interface doesn't have a text
keyword. Before we go further, the following is assuming you are using BeautifulSoup 4.7+.
If you'd like to filter by text, you might be able to do something like this:
dimensions = SOUP.select(".specs__title > h4:contains(Dimensions)")
More information on the :contains()
pseudo-class implementation is available here: https://facelessuser.github.io/soupsieve/selectors/#:contains.
EDIT: To clarify, there is no way to incorporate regex directly into a select
call currently. You would have to filter the elements after the fact to use regex. In the future there may be a way to use regex via some custom pseudo-class, but currently there is no such feature available in Soup Sieve (Beautiful Soup's select implementation in 4.7+).
Upvotes: 2