khanhnguyendata
khanhnguyendata

Reputation: 497

BeautifulSoup: Find any tag that matches one tag by name, or another tag by attribute

I want to find all tags that are either <h1> OR <div class='abc'>. I tried bs.find_all(['h1', 'div'], attrs={'class': 'abc'}), but this will ignore the <h1> tags: apparently, the attrs argument applies an AND condition to the search (meaning the tag must belong to the list of tag name AND have the given attribute, which the <h1> tags do not meet).

Can anyone suggest a fix to this? Thank you.

Upvotes: 1

Views: 580

Answers (2)

wong.lok.yin
wong.lok.yin

Reputation: 889

How about concatenate two results like bs.find_all('h1') + bs.find_all(['div', attrs={'class': 'abc'}) ?

Upvotes: 0

abhilb
abhilb

Reputation: 5757

May be you can use select.

from bs4 import BeautifulSoup as bs
from io import StringIO

data = """<html>
<body>
<h1>Test 1</h1>
<h2>Test 2</h2>
<div class='abc'><p>Test 3</p></div>
</body>
</html>"""

soup = bs(StringIO(data), 'html.parser')
print(soup.select('h1,div[class="abc"]'))
print(soup.find_all(['h1', 'div'], attrs={'class' : 'abc'}))

output

[<h1>Test 1</h1>, <div class="abc"><p>Test 3</p></div>]
[<div class="abc"><p>Test 3</p></div>]

Upvotes: 3

Related Questions