anekix
anekix

Reputation: 2563

using counter with nested dictionaries in python

i have a list of dicts in python like this:

[
  {
    "25-34": {
      "Clicks": 10
    },
    "45-54": {
      "Clicks": 2
    },

  },
  {
    "25-34": {
      "Clicks": 20
    },
    "45-54": {
      "Clicks": 10
    },

  }   
]

how can i get the sum of keys in each dict of list such that i have:

{
    "25-34": {
        "Clicks": 30
    },
    "45-54": {
        "Clicks": 12
    },

}

I tried using Counter() but it works easily when the dicts inside list are flat but with the nested dicts like above it gives this error:

    /usr/lib/python2.7/collections.pyc in update(self, iterable, **kwds)
        524                     self_get = self.get
        525                     for elem, count in iterable.iteritems():
    --> 526                         self[elem] = self_get(elem, 0) + count
        527                 else:
        528                     super(Counter, self).update(iterable) # fast path when counter is empty

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

How can i achieve the summation as i described above.

NOTE: i have added clicks just for sample. nested dicts can have any no of keys, another example to make it more clear:

[
  {
    "25-34": {
      "Clicks": 10,
      "Visits": 1
    },
    "45-54": {
      "Clicks": 2,
      "Visits": 2
    },

  },
  {
    "25-34": {
      "Clicks": 20,
      "Visits": 3
    },
    "45-54": {
      "Clicks": 10,
      "Visits": 4
    },

  }   
]

output:

{
    "25-34": {
      "Clicks": 30,
      "Visits": 4
    },
    "45-54": {
      "Clicks": 12,
      "Visits": 6
    },

  }  

Upvotes: 8

Views: 6049

Answers (5)

Paulo Sergio Schlogl
Paulo Sergio Schlogl

Reputation: 504

I did like this:

with gzip.open("data/small_fasta.fa.gz", "rt") as handle:
    aac_count = defaultdict(Counter)
    for record in SeqIO.parse(handle, "fasta"):
        aac_count[record.id].update(record.seq)

I used biopython for open the fasta file (https://pt.wikipedia.org/wiki/Formato_FASTA) that are the type of file I use a lot.

It has a header('>proteinx' and in the next line a sequence (string of dna or proteins). And biopython is one easy way to deal with fasta files.

Then I used defaultdic and Counter for collections. Record.id is the header and is the key and I update the counter with the sequence for count the number of given character inside the strings.

The output is somenthing like this in my case:

defaultdict(collections.Counter,
            {'UniRef100_Q6GZX4': Counter({'M': 6,
                      'A': 13,
                      'F': 8,
                      'S': 13,
                      'E': 15,
                      'D': 17,
                      'V': 21,
                      'L': 25,
                      'K': 29,
                      'Y': 14,
                      'R': 15,
                      'P': 11,
                      'N': 8,
                      'W': 4,
                      'Q': 9,
                      'C': 4,
                      'G': 15,
                      'I': 12,
                      'H': 9,
                      'T': 8}),
             'UniRef100_Q6GZX3': Counter({'M': 7,
                      'S': 22,
                      'I': 10,
                      'G': 23,
                      'A': 26,
                      'T': 26,
                      'R': 16,
                      'L': 14,
                      'Q': 13,
                      'N': 9,
                      'D': 24,
                      'K': 17,
                      'Y': 11,
                      'P': 37,
                      'C': 18,
                      'F': 9,
                      'W': 6,
                      'E': 6,
                      'V': 23,
                      'H': 3}),...}

Upvotes: 0

Or Duan
Or Duan

Reputation: 13860

I would use defaultdict with default of int(which is 0):

from collections import defaultdict
counter = defaultdict(int)

for current_dict in data:
    for key, value in current_dict.items():
        counter[key] += sum(value.values())

This is the most readable way to count the values in my opinion.

Upvotes: 1

Eugene Dennis
Eugene Dennis

Reputation: 111

My variation without list comprehensions:

def my_dict_sum(data):
"""
>>> test_data = [{"25-34": {"Clicks": 10, "Visits": 1}, "45-54": {"Clicks": 2, "Visits": 2}, },{"25-34": {"Clicks": 20, "Visits": 3}, "45-54": {"Clicks": 10, "Visits": 4}, }]
>>> my_dict_sum(test_data)
{'45-54': {'Clicks': 12, 'Visits': 6}, '25-34': {'Clicks': 30, 'Visits': 4}}
"""
result_key = data[0]
for x in data[1:]:
    for y in x:
        if y in result_key:
            for z in x[y]:
                if z in result_key[y]:
                    result_key[y][z] = result_key[y][z] + x[y][z]
return result_key

Upvotes: 1

Eric Duminil
Eric Duminil

Reputation: 54313

For your first questions, here's a one-liner. It's not really pretty but it does use Counter:

sum((Counter({k:v['Clicks'] for k,v in d.items()}) for d in data), Counter())

As an example :

data = [
  {
    "25-34": {
      "Clicks": 10
    },
    "45-54": {
      "Clicks": 2
    },

  },
  {
    "25-34": {
      "Clicks": 20
    },
    "45-54": {
      "Clicks": 10
    },

  }   
]

from collections import Counter

c = sum((Counter({k:v['Clicks'] for k,v in d.items()}) for d in data), Counter())
print(c)

It outputs:

Counter({'25-34': 30, '45-54': 12})

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96360

From your edit, it sounds like you are just trying to sum the values of all the sub-dicts, by the parent dict:

In [9]: counts = Counter()

In [10]: for dd in data:
    ...:     for k,v in dd.items():
    ...:         counts[k] += sum(v.values())
    ...:

In [11]: counts
Out[11]: Counter({'25-34': 30, '45-54': 12})

Fundamentally, this is an unwieldy data-structure.

OK, given your last update, I think the easiest thing would be to go with a defaultdict with a Counter factory:

In [17]: from collections import Counter, defaultdict

In [18]: counts = defaultdict(Counter)

In [19]: for dd in data:
    ...:     for k, d in dd.items():
    ...:         counts[k].update(d)
    ...:

In [20]: counts
Out[20]:
defaultdict(collections.Counter,
            {'25-34': Counter({'Clicks': 30, 'Visits': 4}),
             '45-54': Counter({'Clicks': 12, 'Visits': 6})})

Upvotes: 8

Related Questions