SomethingSomething
SomethingSomething

Reputation: 12186

Is repeated dictionary access optimized in Python

Consider the following Python code, that iterates over an array of words and counts them into the dictionary a['words']

a['words'] = {}
for word in words:
    if word not in a['words']:
        a['words'][word] = 0
    a['words'][word] += 1

The question is, whether the repeated access to a['words'] is optimized in Python, in a way that the reference of a['words'] is automatically saved somewhere until changed, OR should I write myself the "optimized" code, this way:

a['words'] = {}
words_dict = a['words']
for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

Upvotes: 3

Views: 260

Answers (3)

Sraw
Sraw

Reputation: 20224

I think it is possible for an interpreter. So I am going to make a discussion on how to.

First we need to specify the question, correct me if I am wrong:

Given a lookup action and a scope, if the result of lookup action isn't changed in this scope, interpreter should cache the result.

Caching is just a piece of cake, so what really valuable to discuss are unchanged lookup action and scope.

As far as I know, the minimal scope in python is function block(actually I am not sure about it). But the unchanged lookup action scope can be just part of a function block. So in this case, in order to cache this action, a new scope which is useless most of time should be introduced into python's runtime.


Thinking in another way, could the unchanged lookup action be detected during compiling? I think... it might be possible. Pypy use JIT tech to optimize performance, especially in some loop block(for example, a http server). So maybe Pypy has already done this feature?

Upvotes: 0

Jean-François Fabre
Jean-François Fabre

Reputation: 140188

for word in words:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

performs up to 3 dict accesses per loop. Even if access is O(1), hashing is far from free, specially on string objects.

In that particular case collections.Counter is perfectly suited. For other cases (like creating a list or appending to it), collections.defaultdict is a good alternative, and it's faster. Fictive alternate example:

c = collections.defaultdict(list)
for i,word in enumerate(words):
    c[word].append(i)

there's also the dict.setdefault() solution, if you want to avoid collections module.

Upvotes: 2

zipa
zipa

Reputation: 27869

Good solution is collections.Counter as it is high-performance container:

from collections import Counter
words = ['aaa', 'bbb', 'ccc', 'ddd', 'aaa', 'bbb', 'eee']
a = {'words' : dict(Counter(words))}
a
#{'words': {'aaa': 2, 'bbb': 2, 'ccc': 1, 'ddd': 1, 'eee': 1}}

Upvotes: 4

Related Questions