Reputation: 9024
I have a item pipeline to process prices. I am having errors while processing item in this pipeline. But scrapy error doesn't tells which url produced error. Is there any way i can access the request object inside the pipeline
def process_item(self, item, spider):
"""
:param self:
:param item:
:param spider:
"""
print dir(spider) # No request object here...
quit()
if not all(item['price']):
raise DropItem
item['price']['new'] = float(re.sub(
"\D", "", item['price']['new']))
item['price']['old'] = float(re.sub(
"\D", "", item['price']['old']))
try:
item['price']['discount'] = math.ceil(
100 - (100 * (item['price']['new'] /
item['price']['old'])))
except ZeroDivisionError as e:
print "Error in calculating discount {item} {request}".format(item=item, request=spider.request) # here I want to see the culprit url...
raise DropItem
return item
Upvotes: 1
Views: 591
Reputation: 21271
In your spider class, whatever class variables you define here, can be accessed within your pipeline via spider.variable_name
class MySpider(scrapy.Spider):
name = "walmart"
my_var = "TEST"
my_dict = {'test': "test_val"}
Now in your pipeline you can do spider.name
, spider.my_var
, spider.my_dict
.
Upvotes: 2
Reputation: 1548
You can't from an ItemPipeline, you would be able to access the response (and response.url) from an spider middleware but I think the easier solution would be to add a temporary url
field assigned when you yield the item, something like:
yield {...
'url': response.url,
...}
The the url can be easily accessed inside the pipeline.
Upvotes: 3