Reputation: 497
I want to find all tags that are either <h1>
OR <div class='abc'>
. I tried bs.find_all(['h1', 'div'], attrs={'class': 'abc'})
, but this will ignore the <h1>
tags: apparently, the attrs
argument applies an AND condition to the search (meaning the tag must belong to the list of tag name AND have the given attribute, which the <h1>
tags do not meet).
Can anyone suggest a fix to this? Thank you.
Upvotes: 1
Views: 580
Reputation: 889
How about concatenate two results like bs.find_all('h1') + bs.find_all(['div', attrs={'class': 'abc'})
?
Upvotes: 0
Reputation: 5757
May be you can use select.
from bs4 import BeautifulSoup as bs
from io import StringIO
data = """<html>
<body>
<h1>Test 1</h1>
<h2>Test 2</h2>
<div class='abc'><p>Test 3</p></div>
</body>
</html>"""
soup = bs(StringIO(data), 'html.parser')
print(soup.select('h1,div[class="abc"]'))
print(soup.find_all(['h1', 'div'], attrs={'class' : 'abc'}))
output
[<h1>Test 1</h1>, <div class="abc"><p>Test 3</p></div>]
[<div class="abc"><p>Test 3</p></div>]
Upvotes: 3