Reputation: 13
I Want to remove duplicate titles
to be removed from the output, i am using Beautiful soup to scrape the titles.
#!/usr/bin/python
from bs4 import BeautifulSoup
import requests
source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
source = source.text
soup = BeautifulSoup(source, 'lxml')
for tl in soup.find_all('img', class_='responsive-img hover-img'):
title = set()
title = tl.get('title')
print('{}'.format(title))
Output: Output from the above script..
Accelerate
Team Topologies
Accelerate
Project to Product
War and Peace and IT
A Seat at the Table
The Art of Business Value
DevOps for the Modern Enterprise
Making Work Visible
Leading the Transformation
The DevOps Handbook
The Phoenix Project
Beyond the Phoenix Project
We have title Accelerate
which appears twice which needs to be appear one.
Upvotes: 1
Views: 588
Reputation: 3473
You were on the right track, taking advantage of a set()
is a great idea. Just create it before the for
-loop, and add titles in it using method set.add()
. See the following:
from bs4 import BeautifulSoup
import requests
source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
source = source.text
soup = BeautifulSoup(source, 'lxml')
titles = set()
for tl in soup.find_all('img', class_='responsive-img hover-img'):
title = tl.get('title')
titles.add(title)
print(titles)
Upvotes: 1
Reputation: 955
If you need a distinct list here is a slight modification to your code:-
from bs4 import BeautifulSoup
import requests
source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
source = source.text
soup = BeautifulSoup(source, 'lxml')
title = []
for tl in soup.find_all('img', class_='responsive-img hover-img'):
title.append(tl.get('title'))
distinctTitle = (list(set(title)))
Upvotes: 1