Malik A. Rumi
Malik A. Rumi

Reputation: 2025

Python, Scrapy, and variable scope

First, I thought class variables were available to all objects in the class. Second, I thought a variable defined at the same level (block indentation) as the function def was within the scope of functions defined there and in interior blocks. It doesn't make sense to me that scrapy would change this rule. I looked this up before posting, and I don't see any difference between what I did and what is explained here: https://www.programiz.com/python-programming/global-keyword. Note I am not getting the UnboundLocalError, if that makes a difference. I would greatly appreciate an explanation for what is going wrong here and how to fix it.

class / spider declaration

    ...
start_urls = [example.com]

def parse(self, response):
    for url in start_urls:
        yield scrapy.Request(url, callback=self.parse_item)

File "/home/malikarumi/Projects/aishah/acquire/
acquire/spiders/local_etl_01.py", line 18, in parse
    for url in start_urls:
        NameError: name 'start_urls' is not defined

class / spider declaration

    ...
start_urls = [example.com]

def parse(self, response):
    global start_urls
    for url in start_urls:
        yield scrapy.Request(url, callback=self.parse_item)

`File 
"/home/malikarumi/Projects/aishah/acquire/
acquire/spiders/local_etl_01.py", line 18, in parse
    for url in start_urls:
        NameError: name 'start_urls' is not defined`

Upvotes: 0

Views: 116

Answers (1)

Joaquim De la Cruz
Joaquim De la Cruz

Reputation: 55

This is a problem with OOP (Oriented Object Programming). You have to put self.start_urlsfor indicating that start_urls is a class vairable.

start_urls = [example.com]

def parse(self, response):
    for url in self.start_urls:
        yield scrapy.Request(url, callback=self.parse_item)

Upvotes: 2

Related Questions