Reputation: 1007
I am having few problems in sending my items in pipeline as my request is going through several functions.
I just want that is there any manual way of sending item objects to scrapy pipeline. Because i don't know the internal details of the scrapy.
Suppose i have the function called
def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
sendToPiepline(piplineName , item)
Upvotes: 2
Views: 1448
Reputation: 6268
If you delegate directly to the ItemPipelineManager
, you will raise unhandled exceptions in the manager:
[2018-07-21 20:00:02] CRITICAL - Unhandled error in Deferred:
[2018-07-21 20:00:02] CRITICAL -
Traceback (most recent call last):
File "/home/vagrant/.local/share/virtualenvs/vagrant-gKDsaKU3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/vagrant/monitor/pipelines/filter.py", line 24, in process_item
raise DropItem()
scrapy.exceptions.DropItem
This also might unintentionally alter the state of the pipeline and affect processing.
I think the better approach is the grab the Pipeline
instance you're looking for, and call it directly:
try:
# Manually call the filter
f = utils.get_pipeline_instance(self, FilterPipeline)
f.process_item(p, self)
except DropItem:
pass
Using a helper function:
def get_pipeline_instance(spider, pipeline_class):
manager = spider.crawler.engine.scraper.itemproc
for pipe in manager.middlewares:
if isinstance(pipe, pipeline_class):
return pipe
else:
raise NotConfigured('Invalid pipeline')
Upvotes: 0
Reputation: 3002
def parseDetails(self, response):
item = DmozItem()
item['test'] = "mytest"
# Call pipeline.
itemproc = self.crawler.engine.scraper.itemproc
itemproc.process_item(item, self)
return item
Upvotes: 2