Reputation: 115
Let's say I have html file with divs like that:
<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>
How can I get list of all users sending messages?
If I use find
method I get only first user, if I use find_all
I get user1
two times.
Can I somehow make it in one step without deleting duplicates in list made by find_all
?
Upvotes: 1
Views: 53
Reputation: 28630
here's the 2 ways I can only think of doing it:
import bs4
r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''
soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})
users_list = []
for user in messages:
user_id = user.get('title')
if user_id not in users_list:
users_list.append(user_id)
or
import bs4
r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''
soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})
users_list = list(set([ user.get('title') for user in messages ]))
Upvotes: 1
Reputation: 41091
You could use a custom finder function
seen_users = set()
def users(tag):
username = tag.get('title')
if username and 'message' in tag.get('class', ''):
seen_users.add(username)
return True
tags = soup.find_all(users)
print(seen_users) # {'user1', 'user2', 'user3'}
Upvotes: 1