Reputation: 621
I have a list with a set amount of dictionaries inside which I have to compare to one other dictionary.
They have the following form (there is no specific form or pattern for keys and values, these are randomly chosen examples):
list1 = [
{'X1': 'Q587', 'X2': 'Q67G7', ...},
{'AB1': 'P5K7', 'CB2': 'P678', ...},
{'B1': 'P6H78', 'C2': 'BAA5', ...}]
dict1 = {
'X1': set([B00001,B00020,B00010]),
'AB1': set([B00001,B00007,B00003]),
'C2': set([B00001,B00002,B00003]), ...
}
What I want to have now is a new dictionary which has as keys: the values of the dictionaries in list1. and as values the values of dict1. And this only when the keys intersect in compared dictionaries.
I have done this in the following way:
nDicts = len(list1)
resultDict = {}
for key in range(0,nDicts):
for x in list1[key].keys():
if x in dict1.keys():
resultDict.update{list1[key][x]:dict1[x]}
print resultDict
The desired output should be of the form:
resulDict = {
'Q587': set([B00001,B00020,B00010]),
'P5K7': set([B00001,B00007,B00003]),
'BAA5': set([B00001,B00002,B00003]), ...
}
This works but since the amount of data is so high this takes forever. Is there a better way to do this?
EDIT: I have changed the input values a little, the only ones that matter are the keys which intersect between the dictionaries within list1 and those within dict1.
Upvotes: 0
Views: 127
Reputation: 366213
The keys
method in Python 2.x makes a list with a copy of all of the keys, and you're doing this not only for each dict in list1
(probably not a big deal, but it's hard to know for sure without knowing your data), but also doing it for dict1
over and over again.
On top of that, doing an in
test on a list takes a long time, because it has to check each value in the list until it finds a match, but doing an in
test on a dictionary is nearly instant, because it just has to look up the hash value.
Both keys
are actually completely unnecessary—iterating a dict gives you the keys in order (an unspecified order, but the same is true for calling keys()
), and in
-checking a dict searches the same keys you'd get with keys()
. So, just removing them does the same thing, but simpler, faster, and with less memory used. So:
for key in range(0,nDicts):
for x in list1[key]:
if x in dict1:
resultDict={list1[key][x]:dict1[x]}
print resultDict
There are also ways you can simplify this that probably won't help performance that much, but are still worth doing.
You can iterate directly over list1
instead of building a huge list of all the indices and iterating that.
for list1_dict in list1:
for x in list1_dict:
if x in dict1:
resultDict = {list_dict[x]: dict1[x]}
print resultDict
And you can get the keys and values in a single step:
for list1_dict in list1:
for k, v in list1_dict.iteritems():
if k in dict1:
resultDict = {v: dict1[k]}
print resultDict
Also, if you expect most of the values to be found, it will take about twice as long to first check for the value and then look it up as it would to just try to look it up and handle failure. (This is not true if most of the values will not be found, however.) So:
for list1_dict in list1:
for k, v in list1_dict.iteritems():
try:
resultDict = {v: dict1[k]}
print resultDict
except KeyError:
pass
Upvotes: 1
Reputation: 1125398
You can simplify and optimize your operation with set intersections; as of Python 2.7 dictionaries can represent keys as sets using the dict.viewkeys()
method, or dict.keys()
in Python 3:
resultDict = {}
for d in list1:
for sharedkey in d.viewkeys() & dict1:
resultDict[d[sharedkey]] = dict1[sharedkey]
This can be turned into a dict comprehension even:
resultDict = {d[sharedkey]: dict1[sharedkey]
for d in list1 for sharedkey in d.viewkeys() & dict1}
I am assuming here you wanted one resulting dictionary, not a new dictionary per shared key.
Demo on your sample input:
>>> list1 = [
... {'X1': 'AAA1', 'X2': 'BAA5'},
... {'AB1': 'AAA1', 'CB2': 'BAA5'},
... {'B1': 'AAA1', 'C2': 'BAA5'},
... ]
>>> dict1 = {
... 'X1': set(['B00001', 'B00002', 'B00003']),
... 'AB1': set(['B00001', 'B00002', 'B00003']),
... }
>>> {d[sharedkey]: dict1[sharedkey]
... for d in list1 for sharedkey in d.viewkeys() & dict1}
{'AAA1': set(['B00001', 'B00002', 'B00003'])}
Note that both X1
and AB1
are shared with dictionaries in list1
, but in both cases, the resulting key is AAA1
. Only one of these wins (the last match), but since both values in dict1
are exactly the same anyway that doesn't make any odds in this case.
If you wanted separate dictionaries per dictionary in list1
, simply move the for d in list1:
loop out:
for d in list1:
resultDict = {d[sharedkey]: dict1[sharedkey] for sharedkey in d.viewkeys() & dict1}
if resultDict: # can be empty
print resultDict
If you really wanted one dictionary per shared key, move another loop out:
for d in list1:
for sharedkey in d.viewkeys() & dict1:
resultDict = {d[sharedkey]: dict1[sharedkey]}
print resultDict
Upvotes: 1
Reputation: 24788
#!/usr/bin/env python
list1 = [
{'X1': 'AAA1', 'X2': 'BAA5'},
{'AB1': 'AAA1', 'CB2': 'BAA5'},
{'B1': 'AAA1', 'C2': 'BAA5'}
]
dict1 = {
'X1': set(['B00001','B00002','B00003']),
'AB1': set(['B00001','B00002','B00003'])
}
g = ( k.iteritems() for k in list1)
ite = ((a,b) for i in g for a,b in i if dict1.has_key(a))
d = dict(ite)
print d
Upvotes: 0