Pete Dermott
Pete Dermott

Reputation: 723

How to extract duplicate keys and values from a list of python dictionaries?

I have a list of dicts that are taken from a product and its variants, which is defined like so:

attribute_list = [
    {'Finish': 'Chrome'},
    {'Size': 'Large'},
    {'Weight': '1.6kg'},
    {'Finish': 'Chrome'},
    {'Weight': '1.9kg'}
]

I am looking to create two lists, one that has dicts that are not duplicated in the list i.e:

compiled_list = [
    {'Finish': 'Chrome'}
    {'Size': 'Large'}
]

...and another which has the duplicated keys and values in it, i.e:

duplicates_list = [
    {'Weight': '1.6kg'}
    {'Weight': '1.9kg'}
]

Below is the code that I have so far, this gets me as far as having two dictionaries but 1) I think this is horribly inefficent and 2) I can't work out how to remove the first instance of a duplicate dictionary.

compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
    for k, v in attribute.items():
        if k not in compiled_list_keys:
            compiled_list_keys.append(k)
            compiled_list.append(attribute)
        else:
            if attribute not in compiled_list:
                duplicates_list.append(attribute)
                compiled_list_keys.remove(k)

Upvotes: 3

Views: 3987

Answers (3)

ibarrond
ibarrond

Reputation: 7651

This solution involves using Pandas, a Python package much more suited for data management. You will see why:

  1. First we convert the list of dicts to pandas. Here we drop exact duplicates:

    df = pd.DataFrame([list(attr.items())[0] for attr in attribute_list],
                      columns=['key', 'value']).drop_duplicates()
    #>      key     value
      0     Finish  Chrome
      1     Size    Large
      2     Weight  1.6kg
      4     Weight  1.9kg
    
  2. Now we apply our search functions. This is VERY EASY using pandas:

    compiled_df = df.drop_duplicates(subset='key', keep=False)
    #>      key     value
      0     Finish  Chrome
      1     Size    Large
    duplicated_df=df[df.key.duplicated(keep=False)]
    #>      key     value
      2     Weight  1.6kg
      4     Weight  1.9kg
    
  3. Now we convert back to the original list of dicts:

    compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
    #> [{'Finish': 'Chrome'}, {'Size': 'Large'}]
    
    duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]
    #> [{'Weight': '1.6kg'}, {'Weight': '1.9kg'}
    

It might not be the most efficient way, but it is by far much more versatile. In short, 5 lines of code:

df = pd.DataFrame([list(attr.items())[0] for attr in attribute_list],
                      columns=['key', 'value']).drop_duplicates()
compiled_df = df.drop_duplicates(subset='key', keep=False)
duplicated_df=df[df.key.duplicated(keep=False)]
compiled_list = [{item.key: item.value} for item in compiled_df.itertuples()]
duplicated_list = [{item.key: item.value} for item in duplicated_df.itertuples()]        

Upvotes: 3

jpp
jpp

Reputation: 164843

Alternatively, you can restructure your list of dictionaries as a defaultdict of set objects.

Then use a couple of list comprehensions to separate isolated items from duplicates:

from collections import defaultdict

d = defaultdict(set)

for item in attribute_list:
    key, value = next(iter(item.items()))
    d[key].add(value)

compiled_list = [{k: next(iter(v))} for k, v in d.items() if len(v) == 1]
duplicates_list = [{k: w} for k, v in d.items() for w in v if len(v) > 1]

print(compiled_list, duplicates_list, sep='\n')

[{'Finish': 'Chrome'}, {'Size': 'Large'}]
[{'Weight': '1.6kg'}, {'Weight': '1.9kg'}]

Upvotes: 0

Sunitha
Sunitha

Reputation: 12025

When you append attribute to duplicate_list, you have to check for any other exiting attribute in compiled_list with similar key and remove it from compiled_list and append it to duplicate_list

compiled_list = list()
compiled_list_keys = list()
duplicates_list = list()
for attribute in attribute_list:
    for k, v in attribute.items():
        if k not in compiled_list_keys:
            compiled_list_keys.append(k)
            compiled_list.append(attribute)
        else:
            if attribute not in compiled_list:
                exiting_attribute = [d for d in compiled_list if k in d][0]
                compiled_list.remove(exiting_attribute)
                duplicates_list.append(exiting_attribute)
                duplicates_list.append(attribute)
                compiled_list_keys.remove(k)
print (compiled_list)
print (duplicates_list)

Output

[{'Finish': 'Chrome'}, {'Size': 'Large'}]
[{'Weight': '1.6kg'}, {'Weight': '1.9kg'}]

Upvotes: 0

Related Questions