divertingpie
divertingpie

Reputation: 873

Selecting BeautifulSoup tag based on attributes value

Suppose that we have an .htm page with an index and some content below. Each element of the index has its link to the related section on the document; Suppose that our starting point is a tag with an href, (<a href="#001">SECTION 1</a>); I want to look into all tags to find the reference to this href, therefore I want to find some tag which have this value specified for some attribute. I have looked into some of those documents and this are some example of referring tags:

  1. <a id="#001">SECTION 1</a>
  2. <a name="#001">SECTION 1</a>
  3. <div name="#001">SECTION 1</div>
  4. <div id="#001">SECTION 1</div>

Hence, since I cannot predict the tag name or the name of the attribute which contains the reference to the href value, how can I make this search only value based? Is there some BeaufifulSoup member function to do this? Can I avoid the loop looking to all attributes?

Upvotes: 1

Views: 31

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

You can use lambda function in soup.find_all(), for example:

from bs4 import BeautifulSoup

html_doc = """\
    <a id="#001">SECTION 1</a>

    <a>something other</a>

    <a name="#001">SECTION 1</a>
    <div name="#001">SECTION 1</div>

    <div name="#002">something other</div>

    <div id="#001">SECTION 1</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for tag in soup.find_all(lambda tag: any(tag[a] == "#001" for a in tag.attrs)):
    print(tag)

Prints:

<a id="#001">SECTION 1</a>
<a name="#001">SECTION 1</a>
<div name="#001">SECTION 1</div>
<div id="#001">SECTION 1</div>

Upvotes: 1

Related Questions