Reputation: 337
What's the easiest and most efficient way to reduce the duplication of data?
I tried to make an algorithm, but it started to get way to complicated.
I do have data kept in an array like this: [[data, 'country_code',value],[data, 'country_code',value],[data, 'country_code',value],[data, 'country_code',value]]
For example, I have [[2019-01-23, "GER", 200],[2019-01-23,"USA",300],[2019-01-23,"GER", 301]].
And I need:
[[2019-01-23,"GER", 501],[2019-01-23,"USA",300]]
Upvotes: 2
Views: 53
Reputation: 3170
The most idiomatic way to do that is to use a Counter
, from the collections
library:
>>> from collections import Counter
>>> data = [
... ['2019-01-23', 'GER', 200],
... ['2019-01-23', 'USA', 300],
... ['2019-01-23', 'GER', 301],
... ]
>>> counter = Counter()
>>> for date, country_code, count in data:
... counter[(date, country_code)] += count
...
>>> counter
Counter({('2019-01-23', 'GER'): 501, ('2019-01-23', 'USA'): 300})
>>> output_data = [[date, country_code, count] for (date, country_code), count in counter.items()]
>>> output_data
[['2019-01-23', 'USA', 300], ['2019-01-23', 'GER', 501]]
Upvotes: 1
Reputation: 362786
Accumulate with a defaultdict
, and use a list comprehension to collect results:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for date, code, n in L:
... d[date, code] += n
...
>>> [[date, code, n] for [[date, code], n] in d.items()]
[['2019-01-23', 'GER', 501], ['2019-01-23', 'USA', 300]]
Upvotes: 4