Reputation: 31
I have setup my scrapy crawler in a single python script:
import sys
import csv
import scrapy
from scrapy.http import FormRequest
from scrapy.crawler import CrawlerProcess
class MyItem(scrapy.Item):
test = scrapy.Field()
class Spider(scrapy.Spider):
start_urls = [
"blah.com",
]
def parse(self, response):
blahblah = MyItem()
# Some Code
yield blahblah
class crawler:
def start(self):
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'LOG_LEVEL': 'INFO',
'FEED_FORMAT': 'csv',
'FEED_URI': 'Output.csv'
})
process.crawl(Spider)
process.start()
app = crawler()
app.start()
and this is working perfectly.
Now how to add a scrapy middleware such as
process_spider_exception(response, exception, spider)
in this script and use it by adding that to CrawlerProcess
settings?
Upvotes: 2
Views: 601
Reputation: 31
I have solved this problem using twisted errback
that can be used behind of it's callback
and handle the error if callback
throws an exception
.
Relevant [question]: how to scrapy handle dns lookup failed
Upvotes: 1