back-new
back-new

Reputation: 121

How to remove duplicate values from list of dicts and keep original order?

I have a list of dictionaries like this :

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

I want to remove the duplicate dictionaries from this list. Here is what I tried:

import ast
final  = [ast.literal_eval(el1) for el1 in set([str(el2) for el2 in time_array_final])]

eventually it's working but there is issue I want to retain this data in its original order but the order is modified in my output. Is there a way to remove duplicates and maintain the order from the original list?

Note: expected output should be unique and in case of repeating it should pick one record from repeating elements as the code doing above for example in this case output should be

[{'day': 15, 'month': 5},{'day': 29, 'month': 5},{'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}]

Upvotes: 3

Views: 3283

Answers (3)

Copperfield
Copperfield

Reputation: 8510

as an addendum to @BeRT2me answer, you can go a step further and use the ListBaseSet recipe you can find in the standard library

import collections

class ListBasedSet(collections.abc.Set):
    ''' Alternate set implementation favoring space over speed
        and not requiring the set elements to be hashable. '''
    def __init__(self, iterable):
        self.elements = lst = []
        for value in iterable:
            if value not in lst:
                lst.append(value)

    def __iter__(self):
        return iter(self.elements)

    def __contains__(self, value):
        return value in self.elements

    def __len__(self):
        return len(self.elements)

put it in your toolkit and use is simple like

>>> time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]
>>> 
>>> expected=[{'day': 15, 'month': 5},{'day': 29, 'month': 5},{'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}]
>>> 
>>> expected == list(ListBasedSet(time_array_final))
True
>>> 

Upvotes: 0

pho
pho

Reputation: 25489

You can create a dictionary where the key is the string representation of the items in your list, and the value is the actual item.

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

dedupe_dict = {str(item): item for item in time_array_final}

Upon encountering a duplicate item, the dict comprehension will overwrite the previous item with the duplicate one, but that doesn't make any material difference because both items are identical.

Since python 3.6, dictionaries keep insertion order, so dict.values() should give you the output you need.

deduped_list = list(dedupe_dict.values())

Which gives:

[{'day': 15, 'month': 5},
 {'day': 29, 'month': 5},
 {'day': 10, 'month': 6},
 {'day': 12, 'month': 6},
 {'day': 14, 'month': 6},
 {'day': 15, 'month': 6}]

As noted by @Copperfield in their comments on another answer, str(dict) is not the most reliable way of stringifying dicts for comparison, because the order of keys matters.

d1 = {'day': 1, 'month': 2}
d2 = {'month': 2, 'day': 1}

d1 == d2 # True
str(d1) == str(d2) # False

To get around this, you could create a frozenset of the dict.items(), and use that as your key (provided all the values in your dict are hashable) like so:

dedupe_dict = {frozenset(d.items()): d for d in time_array_final}

Upvotes: 3

Timur Shtatland
Timur Shtatland

Reputation: 12347

Use a set to keep track of unique items. The items are converted to strings because dictionaries cannot be hashed in a set (otherwise, you will get an error "TypeError: unhashable type: 'dict'"). Iterate over the original list, adding the element only if its string representation was not already seen.

time_array_final = [{'day': 15, 'month': 5},{'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6},{'day': 15, 'month': 6}, {'day': 15, 'month': 6}, {'day': 15, 'month': 6}]

time_array_final_unique = []
time_array_final_set = set()

for d in time_array_final:
    if str(d) not in time_array_final_set:
        time_array_final_unique.append(d)
        time_array_final_set.add(str(d))
print(time_array_final_unique)
# [{'day': 15, 'month': 5}, {'day': 29, 'month': 5}, {'day': 10, 'month': 6}, {'day': 12, 'month': 6}, {'day': 14, 'month': 6}, {'day': 15, 'month': 6}]

Upvotes: 2

Related Questions