fiacre
fiacre

Reputation: 1180

Expensive operation done once in a function that is called many times, Python 3

I have a long list of groups in json and I want a little utility:

def verify_group(group_id):
    group_ids = set()
    for grp in groups:
        group_ids.add(grp.get("pk"))
    return group_id in group_ids

The obvious approach is to load the set outside the function, or otherwise declare a global -- but let's assume I don't want a global variable.

In statically typed languages I can say that the set is static and, I believe that will accomplish my aim. How would one do something similar in python? That is : the first call initializes the set, group_ids, subsequent calls use the set initialized in the first call.

BTW, when I use the profilestats package to profile this little code snippet, I see these frightening results:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      833    0.613    0.001    1.059    0.001 verify_users_groups.py:25(verify_group)
  2558976    0.253    0.000    0.253    0.000 {method 'get' of 'dict' objects}
  2558976    0.193    0.000    0.193    0.000 {method 'add' of 'set' objects}

I tried adding functools.lru_cache -- but that type of optimization doesn't address my present question -- how can I load the set group_ids once inside a def block?

Thank you for your time.

Upvotes: 2

Views: 207

Answers (1)

Bakuriu
Bakuriu

Reputation: 101929

There isn't an equivalent of static, however you can achieve the same effect in different ways:

One way is to abuse the infamous mutable default argument:

def verify_group(group_id, group_ids=set()):
    if not group_ids:
        group_ids.update(grp.get("pk") for grp in groups)
    return group_id in group_ids

This however allows the caller to change that value (which may be a feature or a bug for you).

I usually prefer using a closure:

def make_group_verifier():
    group_ids = {grp.get("pk") for grp in groups}
    def verify_group(group_id):
        # nonlocal group_ids # if you need to change its value
        return group_id in group_ids
    return verify_group

verify_group = make_group_verifier()

Python is an OOP language. What you describe is an instance method. Initialize the class with the set and call the method on the instance.

class GroupVerifier:
    def __init__(self):
        self.group_ids = {grp.get("pk") for grp in groups}
    def verify(self, group_id):
        # could be __call__
        return group_id in self.group_ids

I'd also like to add that it depends by the API design. You could let the take the responsibility of pre-computing and providing the value if they want performance. This is the choice taken by, for example, random.choices. The cum_weights parameter isn't necessary but it allows the user to remove the cost of computing that array for every call in performance critical code. So instead of having a mutable argument you use None as default and compute that set only if the value passed is None otherwise you assume the caller did the work for you.

Upvotes: 3

Related Questions