Joe Pinsonault
Joe Pinsonault

Reputation: 537

Union of all keys from a list of dictionaries

Say I have a list of dictionaries. They mostly have the same keys in each row, but a few don't match and have extra key/value pairs. Is there a fast way to get a set of all the keys in all the rows?

Right now I'm using this loop:

def get_all_keys(dictlist):
    keys = set()
    for row in dictlist:
        keys = keys.union(row.keys())

It just seems terribly inefficient to do this on a list with hundreds of thousands of rows, but I'm not sure how to do it better

Thanks!

Upvotes: 2

Views: 3901

Answers (5)

mgilson
mgilson

Reputation: 309899

A fun one which works on python3.x1 relies on reduce and the fact the dict.keys() now returns a set-like object:

>>> from functools import reduce
>>> dicts = [{1:2},{3:4},{5:6}]
>>> reduce(lambda x,y:x | y.keys(),dicts,{})
{1, 3, 5}

For what it's worth,

>>> reduce(lambda x,y:x | y.keys(),dicts,set())
{1, 3, 5}

works too, or, if you want to avoid a lambda (and the initializer), you could even do:

>>> reduce(operator.or_, (d.keys() for d in dicts))

Very neat.

This really shines most when you only have two elements. Then, instead of doing something like set(a) | set(b), you can do a.keys() | b.keys() which seems a little nicer to me.


1It can be made to work on python2.7 as well. Use dict.viewkeys instead of dict.keys

Upvotes: 4

martineau
martineau

Reputation: 123453

setsare like dictionaries, and have an update() method, so this would work in your loop:

keys.update(row.iterkeys())

Upvotes: 1

Elazar
Elazar

Reputation: 21595

you can do:

from itertools import chain
return set(chain.from_iterable(dictlist))

As @Jon Clements noted, this can keep only the required data in memory, in contrast to using the * operator for either chain or union.

Upvotes: 3

Jon Clements
Jon Clements

Reputation: 142136

You could try:

def all_keys(dictlist):
    return set().union(*dictlist)

Avoids imports, and will make the most of the underlying implementation of set. Will also work with anything iterable.

Upvotes: 11

lenz
lenz

Reputation: 5817

If you worry about performance, you should quit the dict.keys() method, since it creates a list in memory. And you can use set.update() instead of union, but I don't know if it is faster than set.union().

Upvotes: 0

Related Questions