Reputation: 2563
i have a list of dicts in python like this:
[
{
"25-34": {
"Clicks": 10
},
"45-54": {
"Clicks": 2
},
},
{
"25-34": {
"Clicks": 20
},
"45-54": {
"Clicks": 10
},
}
]
how can i get the sum of keys in each dict of list such that i have:
{
"25-34": {
"Clicks": 30
},
"45-54": {
"Clicks": 12
},
}
I tried using Counter()
but it works easily when the dicts
inside list are flat but with the nested dicts like above it gives this error:
/usr/lib/python2.7/collections.pyc in update(self, iterable, **kwds)
524 self_get = self.get
525 for elem, count in iterable.iteritems():
--> 526 self[elem] = self_get(elem, 0) + count
527 else:
528 super(Counter, self).update(iterable) # fast path when counter is empty
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
How can i achieve the summation as i described above.
NOTE: i have added clicks
just for sample. nested dicts can have any no of keys,
another example to make it more clear:
[
{
"25-34": {
"Clicks": 10,
"Visits": 1
},
"45-54": {
"Clicks": 2,
"Visits": 2
},
},
{
"25-34": {
"Clicks": 20,
"Visits": 3
},
"45-54": {
"Clicks": 10,
"Visits": 4
},
}
]
output:
{
"25-34": {
"Clicks": 30,
"Visits": 4
},
"45-54": {
"Clicks": 12,
"Visits": 6
},
}
Upvotes: 8
Views: 6049
Reputation: 504
I did like this:
with gzip.open("data/small_fasta.fa.gz", "rt") as handle:
aac_count = defaultdict(Counter)
for record in SeqIO.parse(handle, "fasta"):
aac_count[record.id].update(record.seq)
I used biopython for open the fasta file (https://pt.wikipedia.org/wiki/Formato_FASTA) that are the type of file I use a lot.
It has a header('>proteinx' and in the next line a sequence (string of dna or proteins). And biopython is one easy way to deal with fasta files.
Then I used defaultdic and Counter for collections. Record.id is the header and is the key and I update the counter with the sequence for count the number of given character inside the strings.
The output is somenthing like this in my case:
defaultdict(collections.Counter,
{'UniRef100_Q6GZX4': Counter({'M': 6,
'A': 13,
'F': 8,
'S': 13,
'E': 15,
'D': 17,
'V': 21,
'L': 25,
'K': 29,
'Y': 14,
'R': 15,
'P': 11,
'N': 8,
'W': 4,
'Q': 9,
'C': 4,
'G': 15,
'I': 12,
'H': 9,
'T': 8}),
'UniRef100_Q6GZX3': Counter({'M': 7,
'S': 22,
'I': 10,
'G': 23,
'A': 26,
'T': 26,
'R': 16,
'L': 14,
'Q': 13,
'N': 9,
'D': 24,
'K': 17,
'Y': 11,
'P': 37,
'C': 18,
'F': 9,
'W': 6,
'E': 6,
'V': 23,
'H': 3}),...}
Upvotes: 0
Reputation: 13860
I would use defaultdict
with default of int
(which is 0):
from collections import defaultdict
counter = defaultdict(int)
for current_dict in data:
for key, value in current_dict.items():
counter[key] += sum(value.values())
This is the most readable way to count the values in my opinion.
Upvotes: 1
Reputation: 111
My variation without list comprehensions:
def my_dict_sum(data):
"""
>>> test_data = [{"25-34": {"Clicks": 10, "Visits": 1}, "45-54": {"Clicks": 2, "Visits": 2}, },{"25-34": {"Clicks": 20, "Visits": 3}, "45-54": {"Clicks": 10, "Visits": 4}, }]
>>> my_dict_sum(test_data)
{'45-54': {'Clicks': 12, 'Visits': 6}, '25-34': {'Clicks': 30, 'Visits': 4}}
"""
result_key = data[0]
for x in data[1:]:
for y in x:
if y in result_key:
for z in x[y]:
if z in result_key[y]:
result_key[y][z] = result_key[y][z] + x[y][z]
return result_key
Upvotes: 1
Reputation: 54313
For your first questions, here's a one-liner. It's not really pretty but it does use Counter
:
sum((Counter({k:v['Clicks'] for k,v in d.items()}) for d in data), Counter())
As an example :
data = [
{
"25-34": {
"Clicks": 10
},
"45-54": {
"Clicks": 2
},
},
{
"25-34": {
"Clicks": 20
},
"45-54": {
"Clicks": 10
},
}
]
from collections import Counter
c = sum((Counter({k:v['Clicks'] for k,v in d.items()}) for d in data), Counter())
print(c)
It outputs:
Counter({'25-34': 30, '45-54': 12})
Upvotes: 0
Reputation: 96360
From your edit, it sounds like you are just trying to sum the values of all the sub-dicts, by the parent dict:
In [9]: counts = Counter()
In [10]: for dd in data:
...: for k,v in dd.items():
...: counts[k] += sum(v.values())
...:
In [11]: counts
Out[11]: Counter({'25-34': 30, '45-54': 12})
Fundamentally, this is an unwieldy data-structure.
OK, given your last update, I think the easiest thing would be to go with a defaultdict
with a Counter
factory:
In [17]: from collections import Counter, defaultdict
In [18]: counts = defaultdict(Counter)
In [19]: for dd in data:
...: for k, d in dd.items():
...: counts[k].update(d)
...:
In [20]: counts
Out[20]:
defaultdict(collections.Counter,
{'25-34': Counter({'Clicks': 30, 'Visits': 4}),
'45-54': Counter({'Clicks': 12, 'Visits': 6})})
Upvotes: 8