Reputation: 5409
As an example, suppose I'm calculating an average of some variable within some categories. One way to implement this calculation is the following:
from collections import defaultdict
avg = defaultdict(lambda:[0,0])
for cat_name, val in data:
cat = avg[cat_name]
cat[0] += 1
cat[1] += val
avg = { cat: s/n for cat,(n,s) in avg.items() }
This requires creating a new dictionary from the original one.
Alternatively, the division can be done in a loop like this:
for cat,(n,s) in avg.items():
avg[cat] = s/n
This doesn't create a new dictionary object, but requires a key lookup on every iteration, which has the same complexity.
The thing is, the structure of the dictionary doesn't need to be changed, so no hash table lookup are actually necessary. The old values could just be substituted for the new ones in place. But is this possible to do in python?
In C++20, one can do something along these lines:
std::unordered_map<std::string,std::variant<std::tuple<unsigned,double>,double>> avg;
// loop over data to accumulate counts and sums
for (auto& [cat,x] : avg) {
auto [n,s] = x.get<0>();
x = s/n;
}
which transforms the dictionary values in-place.
Is there a way to do something like that in python?
Addendum: I only used the problem of calculating the averages as a concrete example. The question is about efficient transformation of the dictionary values.
While a list
can be used to emulate a reference type, the solution I was thinking about would be along the lines of getting mutable access to the underlying storage of the dict object. Something like the items()
method, but that returns something mutable instead of tuple
s. Or maybe a function that takes a dict
and a callback function that is used to transform the values, a la map
, but in-place. In case the prose is not clear, something of this sort:
for x in dict_instance.mutable_items():
x.value = x.value[1]/x.value[0]
or
dict_instance.transform_values(lambda v: v[1]/v[0])
Addendum 2: I also realize that I could just change the meaning of one of the original list elements, say from sum to average, by doing this:
for val in avg.values():
val[1] = val[1]/val[0]
I wanted to know if there is a way to replace the whole value object and avoid lookup.
Upvotes: 0
Views: 296
Reputation: 130
Actually, you can use list trick.
from collections import defaultdict
# You use list object for average
avg = defaultdict(lambda:[0,0, list([0])])
for cat_name, val in data:
cat = avg[cat_name]
cat[0] += 1
cat[1] += val
for _, (s, n, lst) in avg.items():
lst[0] = s / n
The solution doesn't require a key lookup, but it increases the memory usage.
Or
If total of val variable is only use averaging process, you can use incremental mean.
Incremental Mean Formula from David Silver's lecture slide
Upvotes: 1