Ankur
Ankur

Reputation: 51108

Get all elements that match a specific attribute value, but match any tag or attribute name with BeautifulSoup

Is it possible to get all elements that match a specific attribute value, but match any tag or attribute name with BeautifulSoup. If so does anyone know how to do it?

Here's an example of how I'm trying to do it

from bs4 import BeautifulSoup
import requests

text_to_match = 'https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg'
url = 'https://www.betts.com.au/item/37510-command.html?colour=chocolate'
r = requests.get(url)
bs = BeautifulSoup(r.text, features="html.parser")
possibles = bs.find_all(None, {None: text_to_match})
print(possibles)

This gives me an empty list [].

If I replace {None: text_to_match} with {'href': text_to_match} this example will give some results as expected. I'm trying to figure out how to do this without specifying the attribute's name, and only matching the value.

Upvotes: 0

Views: 413

Answers (1)

Rotem Tal
Rotem Tal

Reputation: 769

You can try to find_all with no limitation and filter those who doesn't correspond to your needs, as such

text_to_match = 'https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg'
url = 'https://www.betts.com.au/item/37510-command.html?colour=chocolate'
r = requests.get(url)
bs = BeautifulSoup(r.text, features="html.parser")
tags = [tag for tag in bs.find_all() if text_to_match in str(tag)]
print(tags)

this sort of solution is a bit clumsy as you might get some irrelevant tags, you make your text a bit more tag specific by:

text_to_match = r'="https://s3-ap-southeast-2.amazonaws.com/bettss3/images/003obzt0t_w1200_h1200.jpg"'

which is a bit closer to the str representation of a tag with attribute

Upvotes: 2

Related Questions