Reputation: 22440
Tried with many different links but every time I get the same result which is: the first link always ends up in last again.
import requests
from lxml import html
Unique=[]
url="https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los+Angeles%2C+CA"
def DupRemoval(Address):
MainLink="https://www.yellowpages.com"
response = requests.get(Address)
Unique.append(Address)
tree=html.fromstring(response.text)
Pagination_link=tree.xpath("//div[@class='pagination']//a/@href")
for Nextpage in Pagination_link:
Blink=MainLink+Nextpage
if Blink not in Unique:
print(Blink)
DupRemoval(url)
Produced Links:
Upvotes: 2
Views: 107
Reputation: 473863
The duplicate link is the "Next" link button which is the last one in the pagination block. Moreover, if you advance further to next pages, you'll also get the "Previous" link there as well.
A quick way to filter it out would be to get all a
elements without the class
attribute:
//div[@class='pagination']//a[not(@class)]/@href
Upvotes: 1