Zach
Zach

Reputation: 1351

Which is more efficient: to use object as class attribute? Or to reinitialize when needed?

I'm building a spider / scraper with Scrapy and was wondering which would be more efficient: to initialize an API wrapper object once as a class attribute? Or reinitialize with each URL request? I'm wondering in the context of overall efficiency and memory (leaks) as this will be a fairly large project (millions of requests).

Case 1:

# init API wrapper ONCE as class attribute
class ScrapySpider():

    api = SomeAPIWrapper()

    urls = [
        'https://website.com',
        # ... +1mil URLs
    ]

    def request(self):
        for url in urls:
            yield Request(url)

    def parse(self, response):
        yield self.api.get_meta(response.url)

Case 2:

# init new API wrapper on EACH request
class ScrapySpider():

    urls = [
        'https://website.com',
        # ... +1mil URLs
    ]

    def request(self):
        for url in urls:
            yield Request(url)

    def parse(self, response):
        api = SomeAPIWrapper()
        yield api.get_meta(response.url)

Upvotes: 0

Views: 75

Answers (2)

bruno desthuilliers
bruno desthuilliers

Reputation: 77892

There's no generic, one-size-fits-all answer to this question - it depends on how costly the object's instantiation is, how often you end up instanciating it in best / average / worst case, and, with your example using a class attribute (instead of an instance attribute), whether it's safe to share this object amongst all instances of the host class.

Note that there are two other terms to the alternative:

1/ a per-instance attribute created in the initializer:

class ScrapySpider():

    def __init__(self, *args, **kw):
        super().__init__(*args, **kw)
        self.api = SomeAPIWrapper()

which avoids the concurrent access issues you might get with a class attribute, and

2/ a cached property

class ScrapySpider():

    @property
    def api(self):
        if not hasattr(self, "_cached_api"):
            self._cached_api = ApiWrapper()
        return self._cached_api

which also prevents creating the ApiWrapper instance before it's needed (might be useful if creating it is costly and it's not always needed) but adds a small overhead on attribute access.

Upvotes: 1

Gallaecio
Gallaecio

Reputation: 3857

In the example code, using a class attribute (Case 1) should be more efficient.

Upvotes: 2

Related Questions