Trouble getting rid of duplicate links

Question

Tried with many different links but every time I get the same result which is: the first link always ends up in last again.

import requests
from lxml import html
Unique=[]
url="https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA"
def DupRemoval(Address):
    MainLink="https://www.yellowpages.com"
    response = requests.get(Address)
    Unique.append(Address)
    tree=html.fromstring(response.text)
    Pagination_link=tree.xpath("//div[@class='pagination']//a/@href")
    for Nextpage in Pagination_link:
        Blink=MainLink+Nextpage
        if Blink not in Unique:
            print(Blink)

DupRemoval(url)

Produced Links:

alecxe · Accepted Answer

The duplicate link is the "Next" link button which is the last one in the pagination block. Moreover, if you advance further to next pages, you'll also get the "Previous" link there as well.

A quick way to filter it out would be to get all a elements without the class attribute:

//div[@class='pagination']//a[not(@class)]/@href

Trouble getting rid of duplicate links

Answers (1)

Related Questions