Manuel
Manuel

Reputation: 802

Spider not scraping a list or urls

I have a spider that gets the URLs to scrape from a list. My problem is that, when I run the spider, no data is being scraped and, what is weird to me and I can't seem to be able to solve is that the spider is indeed entering each site, but not data comes back out.

My spider looks like this

import scrapy
import re
import pandas
import json
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from genericScraper.items import ClothesItem
from scrapy.exceptions import CloseSpider
from scrapy.http import Request

class ClothesSpider(CrawlSpider):

    name = "clothes_spider"

    #Dominio permitido
    allowed_domain = ['www.amazon.com']

    colnames = ['nombre', 'url']

    data = pandas.read_csv('URLClothesData.csv', names = colnames)

    name_list = data.nombre.tolist()

    URL_list = data.url.tolist()

    #Sacamos los primeros de ambas, que seria el indice
    name_list.pop(0)
    URL_list.pop(0)

    start_urls = URL_list

    custom_settings = {

        'FEED_FORMAT': 'csv',
        'FEED_URI' : 'ClothesData.csv'

    }

    def parse_item(self,response):

        cothesAmz_item = ClothesItem()
        cothesAmz_item['nombreProducto'] = response.xpath('normalize-space(//span[contains(@id, "productTitle")]/text())').extract()

        yield cothesAmz_item

What I see in my console is this

ConsoleLightshotPicture

Upvotes: 0

Views: 59

Answers (1)

ThunderMind
ThunderMind

Reputation: 799

By default when spider crawl through start_urls then its default callback function is:

def parse(self, response):
    pass    #Your logic goes here, 

You can try changing your function parse_item to parse.

Upvotes: 1

Related Questions