Reputation: 1693
I have a list of dictionaries and each dictionary has a key of (let's say) 'type' which can have values of 'type1'
, 'type2'
, etc. My goal is to filter out these dictionaries into a list of the same dictionaries but only the ones of a certain "type". I think i'm just really struggling with list/dictionary
comprehensions.
so an example list would look like:
exampleSet = [{'type':'type1'},{'type':'type2'},{'type':'type2'}, {'type':'type3'}]
i have a list of key values. lets say for example:
keyValList = ['type2','type3']
where the expected resulting list would look like:
expectedResult = [{'type':'type2'},{'type':'type2'},{'type':'type3'}]
I know i could do this with a set of for loops. I know there has to be a simpler way though. i found a lot of different flavors of this question but none that really fit the bill and answered the question. I would post an attempt at the answer... but they weren't that impressive. probably best to leave it open ended. any assistance would be greatly appreciated.
Upvotes: 140
Views: 266387
Reputation: 6539
Universal approach to filter the list of dictionaries based on key-value pairs
def get_dic_filter_func(**kwargs):
"""Func to be used for map/filter function,
returned func will take dict values from kwargs keys and compare resulted dict with kwargs"""
def func(dic):
dic_to_compare = {k: v for k, v in dic.items() if k in kwargs}
return dic_to_compare == kwargs
return func
def filter_list_of_dicts(list_of_dicts, **kwargs):
"""Filter list of dicts with key/value pairs
in result will be added only dicts which has same key/value pairs as in kwargs """
filter_func = get_dic_filter_func(**kwargs)
return list(filter(filter_func, list_of_dicts))
Test Case / How to use
def test_filter_list_of_dicts(self):
dic1 = {'a': '1', 'b': 2}
dic2 = {'a': 1, 'b': 3}
dic3 = {'a': 2, 'b': 3}
the_list = [dic1, dic2, dic3]
self.assertEqual([], filter_list_of_dicts(the_list, x=1))
self.assertEqual([dic1], filter_list_of_dicts(the_list, a='1'))
self.assertEqual([dic2], filter_list_of_dicts(the_list, a=1))
self.assertEqual([dic2, dic3], filter_list_of_dicts(the_list, b=3))
Upvotes: 1
Reputation: 1685
Trying a few answers from this post, I tested the performance of each answer.
As my initial guess, the list comprehension is way faster, the filter
and list
method is second and the pandas
is third, by far.
defined variables:
import pandas as pd
exampleSet = [{'type': 'type' + str(number)} for number in range(0, 1_000_000)]
keyValList = ['type21', 'type950000']
list comprehension
%%timeit
expectedResult = [d for d in exampleSet if d['type'] in keyValList]
60.7 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
filter
and list
%%timeit
expectedResult = list(filter(lambda d: d['type'] in keyValList, exampleSet))
94 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
pandas
%%timeit
df = pd.DataFrame(exampleSet)
expectedResult = df[df['type'].isin(keyValList)].to_dict('records')
336 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
On a side note, using pandas
to deal with a dict
is not a great idea since the pandas.DataFrame
is basically a more memory consuming dict
and if you are not going to use a dataframe in the end it is just inefficient.
Upvotes: 42
Reputation: 2130
Use filter
, or if the number of dictionaries in exampleSet
is too high, use ifilter
of the itertools
module. It would return an iterator, instead of filling up your system's memory with the entire list at once:
from itertools import ifilter
for elem in ifilter(lambda x: x['type'] in keyValList, exampleSet):
print elem
Upvotes: 19
Reputation: 52071
You can try a list comp
>>> exampleSet = [{'type':'type1'},{'type':'type2'},{'type':'type2'}, {'type':'type3'}]
>>> keyValList = ['type2','type3']
>>> expectedResult = [d for d in exampleSet if d['type'] in keyValList]
>>> expectedResult
[{'type': 'type2'}, {'type': 'type2'}, {'type': 'type3'}]
Another way is by using filter
>>> list(filter(lambda d: d['type'] in keyValList, exampleSet))
[{'type': 'type2'}, {'type': 'type2'}, {'type': 'type3'}]
Upvotes: 236