Vimal Annamalai
Vimal Annamalai

Reputation: 149

How to perform the function, after all crawling is done in scrapy?

spider_closed() function is not performing. If i give just print statement it is printing but if i perform any function call and return the value it is not working.

import scrapy
import re
from pydispatch import dispatcher
from scrapy import signals

from SouthShore.items import Product
from SouthShore.internalData import internalApi
from scrapy.http import Request

class bestbuycaspider(scrapy.Spider):
    name = "bestbuy_dca"

    allowed_domains = ["bestbuy.ca"]

    start_urls = ["http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+beds",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+night+stand",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+headboard",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+desk",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+bookcase",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+dresser",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+tv+stand",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+armoire",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+kids",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+changing+table",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+baby"]

    def __init__(self,jsondetails="",serverdetails="", *args,**kwargs):
        super(bestbuycaspider, self).__init__(*args, **kwargs)
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
        self.jsondetails = jsondetails
        self.serverdetails=serverdetails
        self.data = []

    def parse(self,response):
        #my stuff here 



    def spider_closed(self,spider):
        print "returning values"
        self.results['extractedData']=self.data
        print self.results=internalApi(self.jsondetails,self.serverdetails)
        yield self.results

1) I want to call some function and return the scraped values

Upvotes: 1

Views: 2470

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21436

You can create an Item Pipeline with close_spider() method:

class MyPipeline(object):
    def close_spider(self, spider):
        do_something_here()

Just don't forget to activate it in settings.py as described in the docummentation link above.

Upvotes: 4

Related Questions