Reputation: 12186
Consider the following Python code, that iterates over an array of words and counts them into the dictionary a['words']
a['words'] = {}
for word in words:
if word not in a['words']:
a['words'][word] = 0
a['words'][word] += 1
The question is, whether the repeated access to a['words']
is optimized in Python, in a way that the reference of a['words']
is automatically saved somewhere until changed, OR should I write myself the "optimized" code, this way:
a['words'] = {}
words_dict = a['words']
for word in words:
if word not in words_dict:
words_dict[word] = 0
words_dict[word] += 1
Upvotes: 3
Views: 260
Reputation: 20224
I think it is possible for an interpreter. So I am going to make a discussion on how to.
First we need to specify the question, correct me if I am wrong:
Given a lookup action and a scope, if the result of lookup action isn't changed in this scope, interpreter should cache the result.
Caching is just a piece of cake, so what really valuable to discuss are unchanged lookup action and scope.
As far as I know, the minimal scope in python is function block(actually I am not sure about it). But the unchanged lookup action scope can be just part of a function block. So in this case, in order to cache this action, a new scope which is useless most of time should be introduced into python's runtime.
Thinking in another way, could the unchanged lookup action be detected during compiling? I think... it might be possible. Pypy use JIT tech to optimize performance, especially in some loop block(for example, a http server). So maybe Pypy has already done this feature?
Upvotes: 0
Reputation: 140188
for word in words:
if word not in words_dict:
words_dict[word] = 0
words_dict[word] += 1
performs up to 3 dict accesses per loop. Even if access is O(1)
, hashing is far from free, specially on string objects.
In that particular case collections.Counter
is perfectly suited. For other cases (like creating a list or appending to it), collections.defaultdict
is a good alternative, and it's faster. Fictive alternate example:
c = collections.defaultdict(list)
for i,word in enumerate(words):
c[word].append(i)
there's also the dict.setdefault()
solution, if you want to avoid collections
module.
Upvotes: 2
Reputation: 27869
Good solution is collections.Counter as it is high-performance container:
from collections import Counter
words = ['aaa', 'bbb', 'ccc', 'ddd', 'aaa', 'bbb', 'eee']
a = {'words' : dict(Counter(words))}
a
#{'words': {'aaa': 2, 'bbb': 2, 'ccc': 1, 'ddd': 1, 'eee': 1}}
Upvotes: 4