Reputation: 11793
I'm trying to collect all the urls for each video from the amazon website below.
https://www.amazon.com/video-Prime/s?ie=UTF8&page=1&rh=n%3A2858778011%2Ck%3Avideo
I'm using scrapy shell to interactively test my code. I started scrapy shell like below. I
scrapy shell 'https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Dinstant-video&field-keywords=video&rh=n%3A2858778011%2Ck%3Avideo'
My response status is 200. Then in scrapy shell, I tried to extract all the video url using xpath selector like below:
response.xpath("//ul[contains(@id, 's-results-list-atf')]/li//a/@href").extract()
I got way more href link than expected. When I checked the web html, that does not make sense. There are ten videos on that page and only one href link for each video. I cannot understand why that happens. I appreciate it if anyone can help. Thanks a lot in advance.
Upvotes: 1
Views: 382
Reputation: 52665
Try below XPath to match only required links
//ul[@id="s-results-list-atf"]//a[h2]/@href
Upvotes: 1
Reputation: 28236
There are ten videos on that page and only one href link for each video.
Are you sure you're looking at the correct page?
Here's a screenshot of the first result I see on that page, with borders added around links.
As you can see, there are 9 links in total for this particular item.
Looks like you'll have to make your xpath more specific, so it only captures the links you want.
Upvotes: 0