Reputation: 802
I have a spider that gets the URLs to scrape from a list. My problem is that, when I run the spider, no data is being scraped and, what is weird to me and I can't seem to be able to solve is that the spider is indeed entering each site, but not data comes back out.
My spider looks like this
import scrapy
import re
import pandas
import json
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from genericScraper.items import ClothesItem
from scrapy.exceptions import CloseSpider
from scrapy.http import Request
class ClothesSpider(CrawlSpider):
name = "clothes_spider"
#Dominio permitido
allowed_domain = ['www.amazon.com']
colnames = ['nombre', 'url']
data = pandas.read_csv('URLClothesData.csv', names = colnames)
name_list = data.nombre.tolist()
URL_list = data.url.tolist()
#Sacamos los primeros de ambas, que seria el indice
name_list.pop(0)
URL_list.pop(0)
start_urls = URL_list
custom_settings = {
'FEED_FORMAT': 'csv',
'FEED_URI' : 'ClothesData.csv'
}
def parse_item(self,response):
cothesAmz_item = ClothesItem()
cothesAmz_item['nombreProducto'] = response.xpath('normalize-space(//span[contains(@id, "productTitle")]/text())').extract()
yield cothesAmz_item
What I see in my console is this
Upvotes: 0
Views: 59
Reputation: 799
By default when spider crawl through start_urls then its default callback
function is:
def parse(self, response):
pass #Your logic goes here,
You can try changing your function parse_item
to parse
.
Upvotes: 1