Reputation: 3
I have a basic scrapy project where I have hard coded 2 variables - pProd and pReviews. I would now like either to read these variables from a csv file or pass them when calling the spider. I have been trying for the last couple of hours but seem to be getting nowhere using the -a attribute when calling the spider. eg:
scrapy crawl myspider -a Prod="P123" -a Revs="200" -o test.csv
This is the code I have with the hard coded variables:
import scrapy
from scrapy import Spider, Request
import re
import json
class myspider(Spider):
name = 'myspider'
allowed_domains = ['mydom.com']
start_urls = ['https://api.mydom.com']
def start_requests(self):
urls = ["https://api.mydom.com"]
pProd = "P123"
pReviews = 200
for url in urls:
#Generate URL as API only brings back 100 at a time
for i in range(0, pReviews, 100):
links = 'https://api.mydom.com/data/reviews.json?Filter=ProductId%3A' + pProd + '&Offset=' + str(i) + '&passkey=123qwe'
yield scrapy.Request(
url=str(links),
cb_kwargs={'ProductID' : pProd},
callback=self.parse_reviews,
)
def parse_reviews(self, response, ProductID):
data = json.loads(response.text)
proddata = data['Includes']
reviews = data['Results']
p_prodid = ProductID
try:
p_prodcat = proddata['Products'][ProductID]['CategoryId']
except:
p_prodcat = None
for review in reviews:
try:
r_reviewdate = review['SubmissionTime']
except:
r_reviewdate = None
yield{
'prodid' : p_prodid,
'prodcat' : p_prodcat,
'reviewdate' : r_reviewdate,
}
I have tried several different ways including adding the variable names in the def start_requests like:
def start_requests(self, pProd='', pReviews='', **kwargs):
but seem to be getting no-where. Would appreciate a little guidance as to where I am going wrong.
Upvotes: 0
Views: 221
Reputation: 194
You don't have to declare the constructor (init) every time you want to code a scrapy's spider, you could just specify the parameters as before:
scrapy crawl myspider -a parameter1=value1 -a parameter2=value2
and in your spider code you can just use them as spider arguments:
class MySpider(Spider):
name = 'myspider'
...
def parse(self, response):
...
if self.parameter1 == value1:
# this is True
# or also
if getattr(self, parameter2) == value2:
# this is also True
*from How to pass a user defined argument in scrapy spider
Upvotes: 1