Reputation: 107
I am experimenting with scrapy and running into some issues. The problem is that my script is returning duplicate results. I am trying to scrape urls from a parent page and follow each individual url to obtain an associated date. After scraping each nested url, it seems that it will again output the list of urls from the parent page.
Here is the script:
import scrapy from aeon.items import AeonItem from scrapy.http.request import Request class AeonSpider(scrapy.Spider): name = "aeon" allowed_domains = ["aeon.co"] start_urls = [ "http://aeon.co/magazine/technology" ] def parse(self, response): items = [] for sel in response.xpath('//*[@id="latestPosts"]'): item = AeonItem() item['primary_url'] = sel.xpath('div/div/div/a/@href').extract() for each in item['primary_url']: yield Request(each, callback=self.parse_next_page,meta={'item':item}) def parse_next_page(self, response): for sel in response.xpath('//*[@id="top"]'): item = response.meta['item'] item['comments'] = sel.xpath('div[5]/div[3]/div[2]/div/p/em/span[@class="instapaper_datepublished"]/text()').extract() return item
Here is the json output:
{"comments": ["13 February 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["31 January 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["12 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["31 March 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]} {"comments": ["30 May 2014"], "primary_url": ["http://aeon.co/magazine/science/the-search-for-quantum-gravity/", "http://aeon.co/magazine/philosophy/should-generation-ted-take-a-more-sceptical-view/", "http://aeon.co/magazine/technology/the-elon-musk-interview-on-mars/", "http://aeon.co/video/technology/analogue-people-in-a-digital-age-a-short-film-about-technology/", "http://aeon.co/video/technology/boxa-short-film-about-projection-mapping/", "http://aeon.co/video/technology/how-to-sharpen-pencils-a-short-film-about-a-master-artisan/", "http://aeon.co/magazine/technology/do-we-want-minority-report-policing/", "http://aeon.co/magazine/health/can-you-have-self-worth-without-self-love/", "http://aeon.co/magazine/technology/i-learnt-to-survive-like-an-11th-century-farmer/", "http://aeon.co/magazine/technology/can-tiny-plankton-help-reverse-climate-change/", "http://aeon.co/magazine/technology/are-halophytes-the-crop-of-the-future/", "http://aeon.co/magazine/technology/how-will-sexbots-change-human-relationships/", "http://aeon.co/video/technology/robotic-cheetah-a-short-film-about-biomimetic-robotics/", "http://aeon.co/video/technology/internet-archive-a-short-film-about-accessing-knowledge/", "http://aeon.co/video/technology/a-tiny-planet-a-short-film-about-wondrous-video-technology/", "http://aeon.co/magazine/culture/there-is-fortuitous-beauty-in-a-brute-force-attack/", "http://aeon.co/magazine/technology/can-we-design-systems-to-automate-ethics/", "http://aeon.co/magazine/technology/before-minecraft-or-snapchat-there-was-micromuse/", "http://aeon.co/magazine/technology/meet-darpas-new-generation-of-humanoid-robots/", "http://aeon.co/magazine/technology/the-problem-with-too-much-information/", "http://aeon.co/magazine/technology/can-nyc-be-completely-self-reliant/", "http://aeon.co/video/technology/theo-a-short-film-about-the-wind-eating-strandbeest/", "http://aeon.co/video/technology/terminal-a-short-film-about-the-mechanical-ballet-of-cargo/", "http://aeon.co/video/technology/metropolis-ii-a-short-film-about-the-city-of-tomorrow/", "http://aeon.co/magazine/culture/digital-art-should-be-about-possibilities-not-technicalities/", "http://aeon.co/magazine/society/can-sustainability-really-hope-to-beat-consumerism/", "http://aeon.co/magazine/technology/is-technology-making-the-world-too-complex/", "http://aeon.co/magazine/culture/creepypasta-is-how-the-internet-learns-our-fears/", "http://aeon.co/magazine/technology/virtual-afterlives-will-transform-humanity/", "http://aeon.co/magazine/technology/what-will-happen-to-my-online-identity-when-i-die/", "http://aeon.co/magazine/society/what-does-silicon-valley-tell-us-about-innovation/", "http://aeon.co/magazine/technology/the-rise-of-biotechnology-and-the-loss-of-scientific-neutrality/", "http://aeon.co/magazine/culture/why-i-gave-up-living-in-an-off-grid-commune/"]}
To reiterate, I am having trouble outputting one list of urls from the parent page and one list of corresponding dates from each individual nested url. I am new to scrapy and to python so hopefully someone can point me in the right direction.
Upvotes: 0
Views: 1239
Reputation: 23796
Your code is iterating on the wrong thing.
That response.xpath('//*[@id="latestPosts"]')
bit returns a list with only one selector that contains all the article links.
Try changing the loop to:
for sel in response.xpath('//*[@id="latestPosts"]/div/div/div'):
item = AeonItem()
item['primary_url'] = sel.xpath('./a/@href').extract()
...
You probably want to apply the same change on the other callback too -- I'll leave the rest of the fun for you. =)
Upvotes: 1