Reputation: 195
I have a list of dictionaries and I need to count duplicates by specific keys. For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?
Upvotes: 1
Views: 500
Reputation: 15872
You can use itertools.groupby
with sorted
list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas
then you can use, pandas.DataFrame.groupby
, pandas.groupby.size
, pandas.Series.to_frame
, pandas.DataFrame.reset_index
and finally pandas.DataFrame.to_dict
with orient='records'
:
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
Upvotes: 2