SU3
SU3

Reputation: 5409

Efficient way to transform dict values

As an example, suppose I'm calculating an average of some variable within some categories. One way to implement this calculation is the following:

from collections import defaultdict

avg = defaultdict(lambda:[0,0])
for cat_name, val in data:
    cat = avg[cat_name]
    cat[0] += 1
    cat[1] += val

avg = { cat: s/n for cat,(n,s) in avg.items() }

This requires creating a new dictionary from the original one.

Alternatively, the division can be done in a loop like this:

for cat,(n,s) in avg.items():
    avg[cat] = s/n

This doesn't create a new dictionary object, but requires a key lookup on every iteration, which has the same complexity.

The thing is, the structure of the dictionary doesn't need to be changed, so no hash table lookup are actually necessary. The old values could just be substituted for the new ones in place. But is this possible to do in python?

In C++20, one can do something along these lines:

std::unordered_map<std::string,std::variant<std::tuple<unsigned,double>,double>> avg;
// loop over data to accumulate counts and sums
for (auto& [cat,x] : avg) {
  auto [n,s] = x.get<0>();
  x = s/n;
}

which transforms the dictionary values in-place.

Is there a way to do something like that in python?


Addendum: I only used the problem of calculating the averages as a concrete example. The question is about efficient transformation of the dictionary values.

While a list can be used to emulate a reference type, the solution I was thinking about would be along the lines of getting mutable access to the underlying storage of the dict object. Something like the items() method, but that returns something mutable instead of tuples. Or maybe a function that takes a dict and a callback function that is used to transform the values, a la map, but in-place. In case the prose is not clear, something of this sort:

for x in dict_instance.mutable_items():
    x.value = x.value[1]/x.value[0]

or

dict_instance.transform_values(lambda v: v[1]/v[0])

Addendum 2: I also realize that I could just change the meaning of one of the original list elements, say from sum to average, by doing this:

for val in avg.values():
    val[1] = val[1]/val[0]

I wanted to know if there is a way to replace the whole value object and avoid lookup.

Upvotes: 0

Views: 296

Answers (1)

Ahmet Dundar
Ahmet Dundar

Reputation: 130

Actually, you can use list trick.

from collections import defaultdict
# You use list object for average
avg = defaultdict(lambda:[0,0, list([0])])
for cat_name, val in data:
    cat = avg[cat_name]
    cat[0] += 1
    cat[1] += val

for _, (s, n, lst) in avg.items():
    lst[0] = s / n

The solution doesn't require a key lookup, but it increases the memory usage.

Or

If total of val variable is only use averaging process, you can use incremental mean.

Incremental Mean Formula from David Silver's lecture slide enter image description here

Upvotes: 1

Related Questions