xcoder
xcoder

Reputation: 31

Adding scrapy middleware from current script

I have setup my scrapy crawler in a single python script:

import sys
import csv
import scrapy
from scrapy.http import FormRequest
from scrapy.crawler import CrawlerProcess


class MyItem(scrapy.Item):
    test = scrapy.Field()


class Spider(scrapy.Spider):
    start_urls = [
        "blah.com",
    ]

    def parse(self, response):
        blahblah = MyItem()
        # Some Code
        yield blahblah


class crawler:
    def start(self):
        process = CrawlerProcess({
            'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
            'LOG_LEVEL': 'INFO',
            'FEED_FORMAT': 'csv',
            'FEED_URI': 'Output.csv'
        })
        process.crawl(Spider)
        process.start()

app = crawler()
app.start()

and this is working perfectly. Now how to add a scrapy middleware such as process_spider_exception(response, exception, spider) in this script and use it by adding that to CrawlerProcess settings?

Upvotes: 2

Views: 601

Answers (1)

xcoder
xcoder

Reputation: 31

I have solved this problem using twisted errback that can be used behind of it's callback and handle the error if callback throws an exception.

Relevant [question]: how to scrapy handle dns lookup failed

Upvotes: 1

Related Questions