jamieb
jamieb

Reputation: 10033

How to count all elements in a nested dictionary?

How do I count the number of subelements in a nested dictionary in the most efficient manner possible? The len() function doesn't work as I initially expected it to:

>>> food_colors = {'fruit': {'orange': 'orange', 'apple': 'red', 'banana': 'yellow'}, 'vegetables': {'lettuce': 'green', 'beet': 'red', 'pumpkin': 'orange'}}
>>> len(food_colors)
2
>>>

What if I actually want to count the number of subelements? (e.g., expected result to be "6") Is there a better way to do this rather than looping through each element and summing the qty of subelements? In this particular application, I have about five million subelements to count and every clock cycle counts.

Upvotes: 25

Views: 40235

Answers (10)

zwol
zwol

Reputation: 140569

Is it guaranteed that each top-level key has a dictionary as its value, and that no second-level key has a dictionary? If so, this will go as fast as you can hope for:

sum(len(v) for v in food_colors.values())

If the data structure is more complicated, it will need more code, of course. I'm not aware of any intrinsics to do deep data structure walks.

Upvotes: 20

ms5p3
ms5p3

Reputation: 1

I've used nested dictionaries with more than one level. e.g. in your example if 'apple': 'red', was instead 'apples': {'granny smith': 'green', 'gala': 'red'}, a recursive function can give you an answer. For multiple levels, I find you need to carry the count. I use this if I want to count every element.

def dcount(dct, ci=0):
    # count all items in dict tree (including heirarchy)
    ci = ci + len(dct)
    if isinstance(dct, dict):
        for k1, v1 in dct.items():
            # if the lowest level is list, use instead
            # if isinstance(v1, (dict, list, tuple)):
            if isinstance(v1, dict): 
                ci = dcount(v1, ci)
    return ci

or, a variation

def dcount(dct, ci=0):
    for k1, v1 in dct.items():
        ci += 1
        if isinstance(v1, dict):
            ci = dcount(v1, ci)
    return ci

>>>d = {1: 1, 2: 2, 3: {4: 4, 5: {6: 6, 7: 7, 8: 8}, 9: 9}, 10: 10}
>>>dcount(d)
>>>10

Upvotes: 0

manecosta
manecosta

Reputation: 8772

Arbitrary depth, one liner:

def count(d):
    return sum([count(v) if isinstance(v, dict) else 1 for v in d.values()])

Upvotes: 14

Dodgie
Dodgie

Reputation: 663

For arbitrary depth nested dictionaries:

def num_elements(x):
  if isinstance(x, dict):
    return sum([num_elements(_x) for _x in x.values()])
  else: return 1

Upvotes: 4

Tim McDonald
Tim McDonald

Reputation: 1252

You could do this with a recursive function.

>>> x
{'a': 1, 'b': 2, 'c': 3, 'd': {'I': 1, 'II': 2, 'III': 3}, 'e': 5}
>>> def test(d):
...   cnt = 0
...   for e in d:
...     if type(d[e]) is dict:
...       cnt += test(d[e])
...     else:
...       cnt += 1
...   return cnt
...
>>> test(x)
7

Upvotes: 6

dawg
dawg

Reputation: 103834

For your specific question, you can just use this:

>>> d={'fruit': 
         {'orange': 'orange', 'apple': 'red', 'banana': 'yellow'}, 
       'vegetables': 
         {'lettuce': 'green', 'beet': 'red', 'pumpkin': 'orange'}}
>>> len(d)
2            # that is 1 reference for 'fruit' and 1 for 'vegetables'
>>> len(d['fruit'])
3            # 3 fruits listed...
>>> len(d['vegetables'])
3            # you thought of three of those...
>>> len(d['fruit'])+len(d['vegetables'])
6

While you can use the various tools that Python has to count the elements in this trivial dictionary, the more interesting and productive thing is to think about the structure of the data in the first place.

The basic data structures of Python are lists, sets, tuples, and dictionaries. Any of these data structures can 'hold', by reference, any nested version of itself or the other data structures.

This list is a nested list:

>>> l = [1, [2, 3, [4]], [5, 6]]
>>> len(l)
3
>>> l[0]
1
>>> l[1]
[2, 3, [4]]
>>> l[2]
[5, 6]

The first element is the integer 1. Elements 1 and 2 are lists themselves. The same can be true of any other of the basic Python data structures. These are recursive data structures. You can print them with pprint

If you organize your dictionary a bit better, it is easier to extract information from it with Python's simplest tools:

>>> color='color'
>>> family='family'
>>> sensation='sensation'
>>> good_things={   
            'fruit': 
            {
                'orange': 
                    {
                    color: 'orange', 
                    family: 'citrus',
                    sensation: 'juicy'
                    }, 
                'apple': 
                    {
                    color: ['red','green','yellow'], 
                    family:'Rosaceae',
                    'sensation': 'woody'
                    },
                'banana': 
                    {
                    color: ['yellow', 'green'],
                    family: 'musa',
                    sensation: 'sweet'
                    }
            },
            'vegatables': 
            {
                'beets': 
                    {
                    color: ['red', 'yellow'],
                    family: 'Chenopodiaceae',
                    sensation: 'sweet'
                    },
                'broccoli':
                    {
                    color: 'green',
                    family: 'kale',
                    sensation: 'The butter you put on it',
                    }
            }
        }    

Now the queries against that data make more sense:

>>> len(good_things)
2                        # 2 groups: fruits and vegetables
>>> len(good_things['fruit'])
3                        # three fruits cataloged
>>> len(good_things['vegetables'])
2                        # I can only think of two vegetables...
>>> print good_things['fruit']['apple']
{'color': ['red', 'green', 'yellow'], 'sensation': 'woody', 'family': 'Rosaceae'}
>>> len(good_things['fruit']['apple']['color'])
3                        # apples have 3 colors

Upvotes: 6

bobzsj87
bobzsj87

Reputation: 818

c = sum([len(i) for i in fruit_colors.values() ])

Upvotes: 1

eichin
eichin

Reputation: 186

The subelements are distinct objects, there's no other relationship to use that will be fundamentally faster than iterating over them - though there are lots of ways to do that (using map, or .values(), for example) that will vary in performance, enough that you'll probably want to use timeit to compare them.

If counting them is important to your application, consider doing some things to make that easier:

  • count them as you build the data structure
  • instead of nested dicts, consider an in-memory sqlite table, using connect(":memory:") (this might slow down other operations, or make them more complex, but the trade-off is worth considering.)

Upvotes: 1

carl
carl

Reputation: 50554

Do you only want the immediate children? If so, this is probably the best:

sum(len(x) for x in fc.values())

Upvotes: 0

threenplusone
threenplusone

Reputation: 2122

sum(len(x) for x in food_colors.values())

Upvotes: 0

Related Questions