Reputation: 1
I followed the tutorial on the scrapy page, and I tried to just edit the code to practice on wikipedia. When I do, it outputs the text in the page, but it does so hundreds of times. The JSON file as well as the console contains the same thing printed over and over. I think it may be something to do with the function? Also, what is the difference between sel.xpath and site.xpath?
Thanks!
Here is the code:
from scrapy.spider import Spider
from scrapy.selector import Selector
from tutorial.items import DmozItem
class DmozSpider(Spider):
name = "dmoz"
allowed_domains = ["wikipedia.com"]
start_urls = [
"http://en.wikipedia.org/wiki/Caesar_Hull"
]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div')
items =[]
for site in sites:
item = DmozItem()
item['title'] = sel.xpath('.//p/text()').extract()
items.append(item)
return items
Upvotes: 0
Views: 137
Reputation: 11396
if you want second xpath to be relative to first one, instead of:
item['title'] = sel.xpath('.//p/text()').extract()
do:
item['title'] = site.xpath('.//p/text()').extract()
looping on //div
create as many divs as they found in the document of course.
running sel.xpath('.//p/text()')
is the same as running sel.xpath('//p/text()')
hence keep getting the same result over and over again
Upvotes: 1