stackho gusain
stackho gusain

Reputation: 13

How to remove duplicate titles while scraping it from web-page

I Want to remove duplicate titles to be removed from the output, i am using Beautiful soup to scrape the titles.

#!/usr/bin/python

from bs4 import BeautifulSoup
import requests


    source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
    source = source.text
    
    soup = BeautifulSoup(source, 'lxml')
    
    for tl in soup.find_all('img', class_='responsive-img hover-img'):
      title = set()
      title = tl.get('title')
      print('{}'.format(title))

Output: Output from the above script..

Accelerate
Team Topologies
Accelerate
Project to Product
War and Peace and IT
A Seat at the Table
The Art of Business Value
DevOps for the Modern Enterprise
Making Work Visible
Leading the Transformation
The DevOps Handbook
The Phoenix Project
Beyond the Phoenix Project

We have title Accelerate which appears twice which needs to be appear one.

Upvotes: 1

Views: 588

Answers (2)

arnaud
arnaud

Reputation: 3473

You were on the right track, taking advantage of a set() is a great idea. Just create it before the for-loop, and add titles in it using method set.add(). See the following:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
source = source.text

soup = BeautifulSoup(source, 'lxml')
titles = set()

for tl in soup.find_all('img', class_='responsive-img hover-img'):
    title = tl.get('title')
    titles.add(title)

print(titles)

Upvotes: 1

Prakhar Jhudele
Prakhar Jhudele

Reputation: 955

If you need a distinct list here is a slight modification to your code:-

from bs4 import BeautifulSoup
import requests


source = requests.get('https://itrevolution.com/book-downloads-extra-materials/')
source = source.text

soup = BeautifulSoup(source, 'lxml')
title = []
for tl in soup.find_all('img', class_='responsive-img hover-img'):
  title.append(tl.get('title'))
distinctTitle  = (list(set(title)))

Upvotes: 1

Related Questions