grandrew
grandrew

Reputation: 738

What is the fastest way to instantiate this Python class a million times?

Given I have a class:

class C:
    def __init__(self, flag=None):
        self.a = list()
        self.b = {}
        self.c = 0
        self.d = 1
        self.e = defaultdict(list)
        self.f = defaultdict(set)
        if flag is None:
            self.g = False
        else:
            self.g = flag
        self.something_else = (1,1,1)

    def foo(self):  # many other heavy methods
        self.a.append(self.d + self.c)
        return self.e

I need to instantiate it 1,000,000 times and then call foo(). What is the fastest way to do? Can it be done even faster with CFFI

e.g.

l = []
for i in range(10000000):
   o = C()
   l.append(o)

Upvotes: 1

Views: 934

Answers (3)

Glauco
Glauco

Reputation: 1465

This seems a duplicate

In python you can use slots to speed up creation, maybe a little bit, and the memory occupation. Migrate that class to a slot is very trivial.

another solution was record class it is designe to optimize creation of new instances.

Upvotes: 0

Alain T.
Alain T.

Reputation: 42143

If you are not going to systematically call methods on every instance, you may want to defer initialization to the first call to another method.

from collections import defaultdict
class C:
    def __init__(self, flag=None):
        self.initDone = False
        self.g = False if flag is None else flag

    def competeInit(self):
        if self.initDone: return
        self.a = list()
        self.b = {}
        self.c = 0
        self.d = 1
        self.e = defaultdict(list)
        self.f = defaultdict(set)
        self.something_else = (1,1,1)

    def foo(self):  # many other heavy methods
        self.completeInit()
        self.a.append(self.d + self.c)
        return self.e

This makes allocating 10M instances roughly 5 times faster.

A = [C() for _ in range(10000000)] # 3.95 sec vs 20.4

Depending on usage patterns, this may postpone the cost of initialization to a more acceptable time or even as a background process.

Alternatively you could postpone only the more costly parts of the initialization using properties for lists, dictionary and set attributes:

from collections import defaultdict
class C:
    def __init__(self, flag=None):
        self.initDone = False
        self.g  = False if flag is None else flag
        self.c  = 0
        self.d  = 1
        self.something_else = (1,1,1)

    def foo(self):  # many other heavy methods
        self.a.append(self.d + self.c)
        return self.e

    @property
    def a(self):
        try: return self._a
        except AttributeError:
            self._a = list()
            return self._a

    @property
    def b(self):
        try: return self._b
        except AttributeError:
            self._b = {}
            return self._b

    @property
    def e(self):
        try: return self._e
        except AttributeError:
            self._e = defaultdict(list)
            return self._e

    @property
    def f(self):
        try: return self._e
        except AttributeError:
            self._f = defaultdict(set)
            return self._f

In this case, it only gives a 4x speed improvement though

A = [C() for _ in range(10000000)] # 5.16 sec vs 20.4

Upvotes: 1

Giovanni Rescia
Giovanni Rescia

Reputation: 590

What about using Python's multiprocessing?

from multiprocessing import Pool, cpu_count

def instantiate_and_run(x):
    return C()

with Pool(cpu_count()) as pool:
    l = pool.map(instantiate_and_run, range(1000000))

Upvotes: 0

Related Questions