Reputation: 4939
I have a dictionary d1
and a list l1
.
The dictionary keys are strings, and the values are Objects I have defined myself. If it helps, I can describe the Object in more detail but for now, the objects have a list attribute names
, and some of the elements of name
may or may not appear in l1
.
What I wanted to do was to throw away any element of the dictionary d1
, in which the name
attribute of the object in said element does not contain any of the elements that appear in l1
.
As a trivial example:
l1 = ['cat', 'dog', 'mouse', 'horse', 'elephant',
'zebra', 'lion', 'snake', 'fly']
d1 = {'1':['dog', 'mouse', 'horse','orange', 'lemon'],
'2':['apple', 'pear','cat', 'mouse', 'horse'],
'3':['kiwi', 'lime','cat', 'dog', 'mouse'],
'4':['carrot','potato','cat', 'dog', 'horse'],
'5':['chair', 'table', 'knife']}
so the resulting dictionary will be more or less the same but the elements of each list will be the key-value pairs from 1
to 4
excluding the fruit and vegetables, and will not contain a 5th key-value par as none of the furniture values appear in l1
.
To do this I used a nested list/dictionary comprehension which looked like this:
d2 = {k: [a for a in l1 if a in d1[k]] for k in d1.keys()}
print(d2)
>>>>{'1': ['dog', 'mouse', 'horse'],
'3': ['cat', 'dog', 'mouse'],
'2': ['cat', 'mouse', 'horse'],
'5': [],
'4': ['cat', 'dog', 'horse']}
d2 = {k: v for k,v in d2.iteritems() if len(v)>0}
print(d2)
>>>>{'1': ['dog', 'mouse', 'horse'],
'3': ['cat', 'dog', 'mouse'],
'2': ['cat', 'mouse', 'horse'],
'4': ['cat', 'dog', 'horse'],}
This seems to work, but for big dictionaries, 7000+ items, it takes around 20 seconds to work through. In and of itself, not horrible, but I need to do this inside a loop that will iterate 10,000 times, so currently it's not feasible. Any suggestions on how to do this quickly?
Upvotes: 14
Views: 7674
Reputation: 133554
l1 = ['cat', 'dog', 'mouse', 'horse', 'elephant',
'zebra', 'lion', 'snake', 'fly']
d1 = {'1':['dog', 'mouse', 'horse','orange', 'lemon'],
'2':['apple', 'pear','cat', 'mouse', 'horse'],
'3':['kiwi', 'lime','cat', 'dog', 'mouse'],
'4':['carrot','potato','cat', 'dog', 'horse'],
'5':['chair', 'table', 'knife']}
def gen_items(valid_name_set, d):
for k, v in d.iteritems():
intersection = valid_name_set.intersection(v)
if intersection: # not empty
yield (k, intersection)
print dict(gen_items(set(l1), d1))
Output:
{'1': set(['dog', 'horse', 'mouse']),
'2': set(['cat', 'horse', 'mouse']),
'3': set(['cat', 'dog', 'mouse']),
'4': set(['cat', 'dog', 'horse'])}
Alternatively:
from itertools import ifilter
from operator import itemgetter
set_l1 = set(l1)
d2 = dict(ifilter(itemgetter(1),
((k, set_l1.intersection(v)) for k, v in d1.iteritems())))
Upvotes: 1
Reputation: 776
You are effectively computing the set intersection of each list occuring in the dictionary values with the list l1
. Using lists for set intersections is rather inefficient because of the linear searches involved. You should turn l1
into a set and use set.intersection()
or set membership tests instead (depending on whether it is acceptable that the result is a set again).
The full code could look like this:
l1 = set(l1)
d2 = {k: [s for s in v if s in l1] for k, v in d1.iteritems()}
d2 = {k: v for k, v in d2.iteritems() if v}
Instead of the two dictionary comprehensions, it might also be preferable to use a single for
loop here:
l1 = set(l1)
d2 = {}
for k, v in d1.iteritems():
v = [s for s in v if s in l1]
if v:
d2[k] = v
Upvotes: 14
Reputation: 1274
If you convert l1
to a set
and slightly modify the dict comprehension, you can get this working roughly three times faster:
l1 = set(['cat', 'dog', 'mouse', 'horse', 'elephant',
'zebra', 'lion', 'snake', 'fly'])
d1 = {'1':['dog', 'mouse', 'horse','orange', 'lemon'],
'2':['apple', 'pear','cat', 'mouse', 'horse'],
'3':['kiwi', 'lime','cat', 'dog', 'mouse'],
'4':['carrot','potato','cat', 'dog', 'horse'],
'5':['chair', 'table', 'knife']}
d2 = {k: [a for a in d1[k] if a in l1] for k in d1.keys()}
print(d2)
Here's how you can benchmark the performance:
import timeit
t = timeit.Timer(
"d2 = {k: [a for a in l1 if a in d1[k]] for k in d1.keys()}",
"from __main__ import (d1, l1)",
)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
t = timeit.Timer(
'd2 = {k: [a for a in d1[k] if a in l1] for k in d1.keys()}',
"from __main__ import (d1, l1)",
)
print "%.2f usec/pass" % (1000000 * t.timeit(number=100000)/100000)
I'm assuming here that you don't have control over d1
, and that converting all values of d1
to sets prior to the filtering is too slow.
Upvotes: 0
Reputation: 229361
Use set
:
>>> l1 = ['cat', 'dog', 'mouse', 'horse', 'elephant',
'zebra', 'lion', 'snake', 'fly']
>>> d1 = {'1':['dog', 'mouse', 'horse','orange', 'lemon'],
'2':['apple', 'pear','cat', 'mouse', 'horse'],
'3':['kiwi', 'lime','cat', 'dog', 'mouse'],
'4':['carrot','potato','cat', 'dog', 'horse'],
'5':['chair', 'table', 'knife']}
>>> l1_set = set(l1)
>>> d2 = dict((k, set(d1[k]) & l1_set) for k in d1.keys())
>>> d2
{'1': set(['horse', 'mouse', 'dog']), '3': set(['mouse', 'dog', 'cat']), '2': set(['horse', 'mouse', 'cat']), '5': set([]), '4': set(['horse', 'dog', 'cat'])}
>>> d2 = dict((k, v) for k,v in d2.iteritems() if v)
>>> d2
{'1': set(['horse', 'mouse', 'dog']), '3': set(['mouse', 'dog', 'cat']), '2': set(['horse', 'mouse', 'cat']), '4': set(['horse', 'dog', 'cat'])}
Upvotes: 0
Reputation: 599610
The issue is not the dict comprehension, but the nested list comprehension within that. You are iterating over the same keys every time. This sort of thing is better done with sets.
s1 = set(l1)
d2 = {k: list(s1.intersection(v)) for k, v in d1.items()}
Upvotes: 4