Raheel
Raheel

Reputation: 9024

How to access request object inside Item Pipeline Scrapy

I have a item pipeline to process prices. I am having errors while processing item in this pipeline. But scrapy error doesn't tells which url produced error. Is there any way i can access the request object inside the pipeline

def process_item(self, item, spider):
    """
        :param self:
        :param item:
        :param spider:
    """
    print dir(spider) # No request object here...
    quit()
    if not all(item['price']):
        raise DropItem

    item['price']['new'] = float(re.sub(
        "\D", "", item['price']['new']))
    item['price']['old'] = float(re.sub(
        "\D", "", item['price']['old']))
    try:
        item['price']['discount'] = math.ceil(
            100 - (100 * (item['price']['new'] /
                          item['price']['old'])))
    except ZeroDivisionError as e:
        print "Error in calculating discount {item} {request}".format(item=item, request=spider.request) # here I want to see the culprit url...
        raise DropItem

    return item

Upvotes: 1

Views: 591

Answers (2)

Umair Ayub
Umair Ayub

Reputation: 21271

In your spider class, whatever class variables you define here, can be accessed within your pipeline via spider.variable_name

class MySpider(scrapy.Spider):
    name = "walmart"
    my_var = "TEST"
    my_dict = {'test': "test_val"}

Now in your pipeline you can do spider.name, spider.my_var, spider.my_dict.

Upvotes: 2

Wilfredo
Wilfredo

Reputation: 1548

You can't from an ItemPipeline, you would be able to access the response (and response.url) from an spider middleware but I think the easier solution would be to add a temporary url field assigned when you yield the item, something like:

yield {...
       'url': response.url,
       ...}

The the url can be easily accessed inside the pipeline.

Upvotes: 3

Related Questions