user3261706
user3261706

Reputation: 1

Scrapy outputting the same thing hundreds of times

I followed the tutorial on the scrapy page, and I tried to just edit the code to practice on wikipedia. When I do, it outputs the text in the page, but it does so hundreds of times. The JSON file as well as the console contains the same thing printed over and over. I think it may be something to do with the function? Also, what is the difference between sel.xpath and site.xpath?

Thanks!

Here is the code:

from scrapy.spider import Spider
from scrapy.selector import Selector

from tutorial.items import DmozItem

class DmozSpider(Spider):
   name = "dmoz"
   allowed_domains = ["wikipedia.com"]
   start_urls = [
       "http://en.wikipedia.org/wiki/Caesar_Hull"
   ]

   def parse(self, response):
       sel = Selector(response)
       sites = sel.xpath('//div')
       items =[]
       for site in sites:
            item = DmozItem()
            item['title'] = sel.xpath('.//p/text()').extract()
            items.append(item)
       return items

Upvotes: 0

Views: 137

Answers (1)

Guy Gavriely
Guy Gavriely

Reputation: 11396

if you want second xpath to be relative to first one, instead of:

item['title'] = sel.xpath('.//p/text()').extract()

do:

item['title'] = site.xpath('.//p/text()').extract()

looping on //div create as many divs as they found in the document of course.

running sel.xpath('.//p/text()') is the same as running sel.xpath('//p/text()') hence keep getting the same result over and over again

Upvotes: 1

Related Questions