zesla
zesla

Reputation: 11793

issues fetching href links from amazon website. xpath find many more href links than expected.

I'm trying to collect all the urls for each video from the amazon website below.

https://www.amazon.com/video-Prime/s?ie=UTF8&page=1&rh=n%3A2858778011%2Ck%3Avideo

I'm using scrapy shell to interactively test my code. I started scrapy shell like below. I

scrapy shell 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo'

My response status is 200. Then in scrapy shell, I tried to extract all the video url using xpath selector like below:

response.xpath("//ul[contains(@id, 's-results-list-atf')]/li//a/@href").extract()   

I got way more href link than expected. When I checked the web html, that does not make sense. There are ten videos on that page and only one href link for each video. I cannot understand why that happens. I appreciate it if anyone can help. Thanks a lot in advance.

Upvotes: 1

Views: 382

Answers (2)

Andersson
Andersson

Reputation: 52665

Try below XPath to match only required links

//ul[@id="s-results-list-atf"]//a[h2]/@href

Upvotes: 1

stranac
stranac

Reputation: 28236

There are ten videos on that page and only one href link for each video.

Are you sure you're looking at the correct page?
Here's a screenshot of the first result I see on that page, with borders added around links.

Arrival

As you can see, there are 9 links in total for this particular item.

Looks like you'll have to make your xpath more specific, so it only captures the links you want.

Upvotes: 0

Related Questions