Reputation: 2439

Get the number of all keys in a dictionary of dictionaries in Python

I have a dictionary of dictionaries in Python 2.7.

I need to quickly count the number of all keys, including the keys within each of the dictionaries.

So in this example I would need the number of all keys to be 6:

dict_test = {'key2': {'key_in3': 'value', 'key_in4': 'value'}, 'key1': {'key_in2': 'value', 'key_in1': 'value'}}

I know I can iterate through each key with for loops, but I am looking for a quicker way to do this, since I will have thousands/millions of keys and doing this is just ineffective:

count_the_keys = 0

for key in dict_test.keys():
    for key_inner in dict_test[key].keys():
       count_the_keys += 1

# something like this would be more effective
# of course .keys().keys() doesn't work
print len(dict_test.keys()) * len(dict_test.keys().keys())

Upvotes: 40

Answers (10)

Kasravnd

Reputation: 107347

As a more general way you can use a recursion function and generator expression:

>>> def count_keys(dict_test):
...     return sum(1+count_keys(v) if isinstance(v,dict) else 1 for _,v in dict_test.iteritems())
...

Example:

>>> dict_test = {'a': {'c': '2', 'b': '1', 'e': {'f': {1: {5: 'a'}}}, 'd': '3'}}
>>> 
>>> count(dict_test)
8

Note: In python 3.X use dict.items() method instead of iteritems().

A benchmark with accepted answer which shows that this function is faster than accepted answer:

from timeit import timeit

s1 = """
def sum_keys(d):
    return 0 if not isinstance(d, dict) else len(d) + sum(sum_keys(v) for v in d.itervalues())

sum_keys(dict_test)
"""

s2 = """
def count_keys(dict_test):
    return sum(1+count_keys(v) if isinstance(v,dict) else 1 for _,v in dict_test.iteritems())

count_keys(dict_test)
   """

print '1st: ', timeit(stmt=s1,
                      number=1000000,
                      setup="dict_test = {'a': {'c': '2', 'b': '1', 'e': {'f': {1: {5: 'a'}}}, 'd': '3'}}")
print '2nd : ', timeit(stmt=s2,
                       number=1000000,
                       setup="dict_test = {'a': {'c': '2', 'b': '1', 'e': {'f': {1: {5: 'a'}}}, 'd': '3'}}")

result:

1st:  4.65556812286
2nd :  4.09120802879

Upvotes: 9

donkopotamus

Reputation: 23206

Keeping it Simple

If we know all the values are dictionaries, and do not wish to check that any of their values are also dictionaries, then it is as simple as:

len(dict_test) + sum(len(v) for v in dict_test.itervalues())

Refining it a little, to actually check that the values are dictionaries before counting them:

len(dict_test) + sum(len(v) for v in dict_test.itervalues() if isinstance(v, dict))

And finally, if you wish to do an arbitrary depth, something like the following:

def sum_keys(d):
    return (0 if not isinstance(d, dict) 
            else len(d) + sum(sum_keys(v) for v in d.itervalues())

print sum_keys({'key2': {'key_in3': 'value', 'key_in4': 'value'}, 
                'key1': {'key_in2': 'value', 
                         'key_in1': dict(a=2)}})
# => 7

In this last case, we define a function that will be called recursively. Given a value d, we return either:

0 if that value is not a dictionary; or
the number of keys in the dictionary, plus the total of keys in all of our children.

Making it Faster

The above is a succinct and easily understood approach. We can get a little faster using a generator:

def _counter(d):
    # how many keys do we have?
    yield len(d)

    # stream the key counts of our children
    for v in d.itervalues():
        if isinstance(v, dict):
            for x in _counter(v):
                yield x

def count_faster(d):
    return sum(_counter(d))

This gets us a bit more performance:

In [1]: %timeit sum_keys(dict_test)
100000 loops, best of 3: 4.12 µs per loop

In [2]: %timeit count_faster(dict_test)
100000 loops, best of 3: 3.29 µs per loop

Upvotes: 34

Sede

Reputation: 61263

Using a generator function and the yield from syntax new in Python 3.x. This will work for an arbitrary nested dictionary

>>> from collections import Mapping
>>> def count_keys(mydict):
...     for key, value in mydict.items():
...         if isinstance(value, Mapping):
...             yield from count_keys(value)
...     yield len(mydict)
... 
>>> dict_test = {'key2': {'key_in3': 'value', 'key_in4': 'value'}, 'key1': {'key_in2': 'value', 'key_in1': 'value'}}
>>> sum(count_keys(dict_test))
6

In Python 2.x you need a to do this:

>>> def count_keys(mydict):
...     for key, value in mydict.items():
...         if isinstance(value, Mapping):
...             for item in count_keys(value):
...                 yield 1
...         yield 1
... 
>>> sum(count_keys(dict_test))
6

Upvotes: 6

Dharmik

Reputation: 445

Here is the recursive function to find the nested dictionaries' total number of keys...

s=0
def recurse(v):
   if type(v)==type({}):
     for k in v.keys():
      global s
      s+=1
      recurse(v[k])

Upvotes: 5

phil_20686

Reputation: 4080

len(dict) will return the number of keys in a dictionary, so, assuming you know how nested it is and that all the values are dictionaries:

counter = len(outer_dict)
for v in outer_dict.values :
    counter += len(v)

You can wrap this in a list comprehension :

counter = len(outer_dict)
counter += sum([len(inner_dict) for inner_dict in outer_dict.values])

which is probably the most pythonic. You can extend it as :

counter = len(outer_dict)
counter += sum([len(inner_dict) if isinstance(inner_dict, dict) else 0 for inner_dict in outer_dict.values])

but I tend to think that this is fairly unreadable.

Upvotes: 4

Konstantyn

Reputation: 137

recursive function:

def count_keys(some_dict):
    count = 0
    for key in some_dict:
        if isinstance(some_dict[key], dict):
            count += count_keys(some_dict[key])
        count += 1
    return count

Upvotes: 4

MaxBenChrist

Reputation: 547

How about

n = sum([len(v)+1 for k, v in dict_test.items()])

What you are doing is iterating over all keys k and values v. The values v are your subdictionaries. You get the length of those dictionaries and add one to include the key used to index the subdictionary.

Afterwards you sum over the list to get the complete number of keys.

EDIT:

To clarify, this snippet works only for dictionaries of dictionaries as asked. Not dictionaries of dictionaries of dictionaries...
So do not use it for nested example :)

Upvotes: 9

Viacheslav Kondratiuk

Reputation: 8889

Something like:

print len(dict_test) + sum(len(v) for v in dict_test.values())

Upvotes: 5

matino

Reputation: 17725

You could try using pandas DataFrame for that:

>>> import pandas as pd
>>> data = {'1': {'2': 'a', '3': 'b'}, '4': {'5': 'c', '6': 'd'}, '7': {'5': 'x'}}
>>> df = pd.DataFrame(data)
>>> print (df.count().sum() + len(df.columns))  # 8

The pd.DataFrame(data) line will convert your dictionary to a N x M matrix, where N is number of "parent" keys and M is the number of unique children keys:

     1    4    7
2    a  NaN  NaN
3    b  NaN  NaN
5  NaN    c    x
6  NaN    d  NaN

For each [row, column] you have a value or NaN. You just need to count the non NaN values, which will give you the number of children keys and add len(df.columns), which stands for the number of columns (i.e. parent keys).

Upvotes: 4

Dharmik

Reputation: 445

Try this,

l = len(dict_test)
for k in dict_test:
    l += len(dict_test[k])

Upvotes: 3

Get the number of all keys in a dictionary of dictionaries in Python

Answers (10)

Related Questions