Reputation: 51108
Is it possible to get all elements that match a specific attribute value, but match any tag or attribute name with BeautifulSoup. If so does anyone know how to do it?
Here's an example of how I'm trying to do it
from bs4 import BeautifulSoup
import requests
text_to_match = 'https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg'
url = 'https://www.betts.com.au/item/37510-command.html?colour=chocolate'
r = requests.get(url)
bs = BeautifulSoup(r.text, features="html.parser")
possibles = bs.find_all(None, {None: text_to_match})
print(possibles)
This gives me an empty list [].
If I replace {None: text_to_match}
with {'href': text_to_match}
this example will give some results as expected. I'm trying to figure out how to do this without specifying the attribute's name, and only matching the value.
Upvotes: 0
Views: 413
Reputation: 769
You can try to find_all with no limitation and filter those who doesn't correspond to your needs, as such
text_to_match = 'https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg'
url = 'https://www.betts.com.au/item/37510-command.html?colour=chocolate'
r = requests.get(url)
bs = BeautifulSoup(r.text, features="html.parser")
tags = [tag for tag in bs.find_all() if text_to_match in str(tag)]
print(tags)
this sort of solution is a bit clumsy as you might get some irrelevant tags, you make your text a bit more tag specific by:
text_to_match = r'="https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg"'
which is a bit closer to the str representation of a tag with attribute
Upvotes: 2