Reputation: 621

combining a list of dictionaries with another dictionary

I have a list with a set amount of dictionaries inside which I have to compare to one other dictionary.

They have the following form (there is no specific form or pattern for keys and values, these are randomly chosen examples):

list1 = [
    {'X1': 'Q587', 'X2': 'Q67G7', ...},
    {'AB1': 'P5K7', 'CB2': 'P678', ...},
    {'B1': 'P6H78', 'C2': 'BAA5', ...}]

dict1 = {
    'X1': set([B00001,B00020,B00010]),
    'AB1': set([B00001,B00007,B00003]), 
    'C2': set([B00001,B00002,B00003]),  ...
}

What I want to have now is a new dictionary which has as keys: the values of the dictionaries in list1. and as values the values of dict1. And this only when the keys intersect in compared dictionaries.

I have done this in the following way:

nDicts = len(list1)
resultDict = {}

    for key in range(0,nDicts):
            for x in list1[key].keys():
                if x in dict1.keys():
                    resultDict.update{list1[key][x]:dict1[x]}
                    print resultDict

The desired output should be of the form:

resulDict = {
        'Q587': set([B00001,B00020,B00010]),
        'P5K7': set([B00001,B00007,B00003]), 
        'BAA5': set([B00001,B00002,B00003]),  ...
    }

This works but since the amount of data is so high this takes forever. Is there a better way to do this?

EDIT: I have changed the input values a little, the only ones that matter are the keys which intersect between the dictionaries within list1 and those within dict1.

Upvotes: 0

Answers (3)

abarnert

Reputation: 366213

The keys method in Python 2.x makes a list with a copy of all of the keys, and you're doing this not only for each dict in list1 (probably not a big deal, but it's hard to know for sure without knowing your data), but also doing it for dict1 over and over again.

On top of that, doing an in test on a list takes a long time, because it has to check each value in the list until it finds a match, but doing an in test on a dictionary is nearly instant, because it just has to look up the hash value.

Both keys are actually completely unnecessary—iterating a dict gives you the keys in order (an unspecified order, but the same is true for calling keys()), and in-checking a dict searches the same keys you'd get with keys(). So, just removing them does the same thing, but simpler, faster, and with less memory used. So:

for key in range(0,nDicts):
    for x in list1[key]:
        if x in dict1:
            resultDict={list1[key][x]:dict1[x]}
            print resultDict

There are also ways you can simplify this that probably won't help performance that much, but are still worth doing.

You can iterate directly over list1 instead of building a huge list of all the indices and iterating that.

for list1_dict in list1:
    for x in list1_dict:
        if x in dict1:
            resultDict = {list_dict[x]: dict1[x]}
            print resultDict

And you can get the keys and values in a single step:

for list1_dict in list1:
    for k, v in list1_dict.iteritems():
        if k in dict1:
            resultDict = {v: dict1[k]}
            print resultDict

Also, if you expect most of the values to be found, it will take about twice as long to first check for the value and then look it up as it would to just try to look it up and handle failure. (This is not true if most of the values will not be found, however.) So:

for list1_dict in list1:
    for k, v in list1_dict.iteritems():
        try:
            resultDict = {v: dict1[k]}
            print resultDict
        except KeyError:
            pass

Upvotes: 1

Martijn Pieters

Reputation: 1125398

You can simplify and optimize your operation with set intersections; as of Python 2.7 dictionaries can represent keys as sets using the dict.viewkeys() method, or dict.keys() in Python 3:

resultDict = {}

for d in list1:
    for sharedkey in d.viewkeys() & dict1:
        resultDict[d[sharedkey]] = dict1[sharedkey]

This can be turned into a dict comprehension even:

resultDict = {d[sharedkey]: dict1[sharedkey] 
              for d in list1 for sharedkey in d.viewkeys() & dict1}

I am assuming here you wanted one resulting dictionary, not a new dictionary per shared key.

Demo on your sample input:

>>> list1 = [
...     {'X1': 'AAA1', 'X2': 'BAA5'},
...     {'AB1': 'AAA1', 'CB2': 'BAA5'},
...     {'B1': 'AAA1', 'C2': 'BAA5'},
... ]
>>> dict1 = {
...     'X1': set(['B00001', 'B00002', 'B00003']),
...     'AB1': set(['B00001', 'B00002', 'B00003']),
... }
>>> {d[sharedkey]: dict1[sharedkey] 
...  for d in list1 for sharedkey in d.viewkeys() & dict1}
{'AAA1': set(['B00001', 'B00002', 'B00003'])}

Note that both X1 and AB1 are shared with dictionaries in list1, but in both cases, the resulting key is AAA1. Only one of these wins (the last match), but since both values in dict1 are exactly the same anyway that doesn't make any odds in this case.

If you wanted separate dictionaries per dictionary in list1, simply move the for d in list1: loop out:

for d in list1:
    resultDict = {d[sharedkey]: dict1[sharedkey] for sharedkey in d.viewkeys() & dict1}
    if resultDict:  # can be empty
        print resultDict

If you really wanted one dictionary per shared key, move another loop out:

for d in list1:
    for sharedkey in d.viewkeys() & dict1:
        resultDict = {d[sharedkey]: dict1[sharedkey]}
        print resultDict

Upvotes: 1

Ankur Agarwal

Reputation: 24788

#!/usr/bin/env python

list1 = [

    {'X1': 'AAA1', 'X2': 'BAA5'},
    {'AB1': 'AAA1', 'CB2': 'BAA5'},
    {'B1': 'AAA1', 'C2': 'BAA5'}

    ]


dict1 = {
    'X1': set(['B00001','B00002','B00003']),
    'AB1': set(['B00001','B00002','B00003'])
}    


g = ( k.iteritems() for k in list1)
ite = ((a,b) for i in g for a,b in i if dict1.has_key(a))

d = dict(ite)            
print d

Upvotes: 0

combining a list of dictionaries with another dictionary

Answers (3)

Related Questions