Reputation: 723
I have a list of dicts that are taken from a product and its variants, which is defined like so:
attribute_list = [
{'Finish': 'Chrome'},
{'Size': 'Large'},
{'Weight': '1.6kg'},
{'Finish': 'Chrome'},
{'Weight': '1.9kg'}
]
I am looking to create two lists, one that has dicts that are not duplicated in the list i.e:
compiled_list = [
{'Finish': 'Chrome'}
{'Size': 'Large'}
]
...and another which has the duplicated keys and values in it, i.e:
duplicates_list = [
{'Weight': '1.6kg'}
{'Weight': '1.9kg'}
]
Below is the code that I have so far, this gets me as far as having two dictionaries but 1) I think this is horribly inefficent and 2) I can't work out how to remove the first instance of a duplicate dictionary.
compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
for k, v in attribute.items():
if k not in compiled_list_keys:
compiled_list_keys.append(k)
compiled_list.append(attribute)
else:
if attribute not in compiled_list:
duplicates_list.append(attribute)
compiled_list_keys.remove(k)
Upvotes: 3
Views: 3987
Reputation: 7651
This solution involves using Pandas, a Python package much more suited for data management. You will see why:
First we convert the list of dicts to pandas. Here we drop exact duplicates:
df = pd.DataFrame([list(attr.items())[0] for attr in attribute_list],
columns=['key', 'value']).drop_duplicates()
#> key value
0 Finish Chrome
1 Size Large
2 Weight 1.6kg
4 Weight 1.9kg
Now we apply our search functions. This is VERY EASY using pandas:
compiled_df = df.drop_duplicates(subset='key', keep=False)
#> key value
0 Finish Chrome
1 Size Large
duplicated_df=df[df.key.duplicated(keep=False)]
#> key value
2 Weight 1.6kg
4 Weight 1.9kg
Now we convert back to the original list of dicts:
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
#> [{'Finish': 'Chrome'}, {'Size': 'Large'}]
duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]
#> [{'Weight': '1.6kg'}, {'Weight': '1.9kg'}
It might not be the most efficient way, but it is by far much more versatile. In short, 5 lines of code:
df = pd.DataFrame([list(attr.items())[0] for attr in attribute_list],
columns=['key', 'value']).drop_duplicates()
compiled_df = df.drop_duplicates(subset='key', keep=False)
duplicated_df=df[df.key.duplicated(keep=False)]
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]
Upvotes: 3
Reputation: 164843
Alternatively, you can restructure your list of dictionaries as a defaultdict
of set
objects.
Then use a couple of list comprehensions to separate isolated items from duplicates:
from collections import defaultdict
d = defaultdict(set)
for item in attribute_list:
key, value = next(iter(item.items()))
d[key].add(value)
compiled_list = [{k: next(iter(v))} for k, v in d.items() if len(v) == 1]
duplicates_list = [{k: w} for k, v in d.items() for w in v if len(v) > 1]
print(compiled_list, duplicates_list, sep='\n')
[{'Finish': 'Chrome'}, {'Size': 'Large'}]
[{'Weight': '1.6kg'}, {'Weight': '1.9kg'}]
Upvotes: 0
Reputation: 12025
When you append attribute
to duplicate_list
, you have to check for any other exiting attribute in compiled_list
with similar key and remove it from compiled_list
and append it to duplicate_list
compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
for k, v in attribute.items():
if k not in compiled_list_keys:
compiled_list_keys.append(k)
compiled_list.append(attribute)
else:
if attribute not in compiled_list:
exiting_attribute = [d for d in compiled_list if k in d][0]
compiled_list.remove(exiting_attribute)
duplicates_list.append(exiting_attribute)
duplicates_list.append(attribute)
compiled_list_keys.remove(k)
print (compiled_list)
print (duplicates_list)
Output
[{'Finish': 'Chrome'}, {'Size': 'Large'}]
[{'Weight': '1.6kg'}, {'Weight': '1.9kg'}]
Upvotes: 0