Pratting_About
Pratting_About

Reputation: 3

Passing Variables to Scrapy

I have a basic scrapy project where I have hard coded 2 variables - pProd and pReviews. I would now like either to read these variables from a csv file or pass them when calling the spider. I have been trying for the last couple of hours but seem to be getting nowhere using the -a attribute when calling the spider. eg:

scrapy crawl myspider -a Prod="P123" -a Revs="200" -o test.csv

This is the code I have with the hard coded variables:

import scrapy
from scrapy import Spider, Request
import re
import json

class myspider(Spider):
    name = 'myspider'
    allowed_domains = ['mydom.com']
    start_urls = ['https://api.mydom.com']

    def start_requests(self):
        urls = ["https://api.mydom.com"]
        pProd = "P123"
        pReviews = 200
        for url in urls:
            #Generate URL as API only brings back 100 at a time
            for i in range(0, pReviews, 100):
                links = 'https://api.mydom.com/data/reviews.json?Filter=ProductId%3A' + pProd + '&Offset=' + str(i) + '&passkey=123qwe'
                yield scrapy.Request(
                    url=str(links),
                    cb_kwargs={'ProductID' : pProd},
                    callback=self.parse_reviews,
                )
                
    def parse_reviews(self, response, ProductID):
        data = json.loads(response.text)
        proddata = data['Includes']
        reviews = data['Results']
        p_prodid = ProductID
        try:
            p_prodcat = proddata['Products'][ProductID]['CategoryId']
        except:
            p_prodcat = None
                                
        for review in reviews:
            try:
                r_reviewdate = review['SubmissionTime']
            except:
                r_reviewdate = None
                        
            yield{
                'prodid' : p_prodid,
                'prodcat' : p_prodcat,
                'reviewdate' : r_reviewdate,
            }

I have tried several different ways including adding the variable names in the def start_requests like:

def start_requests(self, pProd='', pReviews='', **kwargs):

but seem to be getting no-where. Would appreciate a little guidance as to where I am going wrong.

Upvotes: 0

Views: 221

Answers (1)

João Santos
João Santos

Reputation: 194

You don't have to declare the constructor (init) every time you want to code a scrapy's spider, you could just specify the parameters as before:

scrapy crawl myspider -a parameter1=value1 -a parameter2=value2

and in your spider code you can just use them as spider arguments:

class MySpider(Spider):
    name = 'myspider'
    ...
    def parse(self, response):
        ...
        if self.parameter1 == value1:
            # this is True

        # or also
        if getattr(self, parameter2) == value2:
            # this is also True

*from How to pass a user defined argument in scrapy spider

Upvotes: 1

Related Questions