mikey
mikey

Reputation: 135

Remove duplicate dict based on field values

Given the following list of dicts, I want to remove duplicates where all fields are identical except for the id field.

old_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"04","name":"harry","age":21},
{"id":"05","name":"larry","age":66}
]

To produce the following:

new_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"05","name":"larry","age":66}
]

My current code only works for cases where all fields of the dictionary are identical:

#! /usr/bin/python
for x in old_data:
 if x not in new_d:
   new_data.append(x)

Upvotes: 2

Views: 203

Answers (4)

Kelly Bundy
Kelly Bundy

Reputation: 27588

Only hardcoding 'id', not the other keys:

tmp = {}
for d in old_data:
    k = frozenset(d.items() - {('id', d['id'])})
    tmp.setdefault(k, d)
new_data = list(tmp.values())

Upvotes: 0

shivankgtm
shivankgtm

Reputation: 1242

a straight forward solution could be just to keep track of dicts in list.

old_data = [
    {"id":"01","name":"harry","age":21},
    {"id":"02","name":"barry","age":32},
    {"id":"03","name":"harry","age":44},
    {"id":"04","name":"harry","age":21},
    {"id":"05","name":"larry","age":66}
]

track_list = []
new_data = []
for obj in old_data:
    if [obj['name'], obj['age']] in  track_list:
        continue
    else:
        track_list.append([obj['name'], obj['age']])
        new_data.append(obj)
        
print(new_data)

output

[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

Upvotes: 0

try this: I ignore id in my comparison

def remove_duplicate(old_data):
    new_data = []
    for i in old_data:
        found=False
        for j in new_data:
            if (j['name']==i['name']) & (j['age']==i['age']):
                found=True
                break;
        if found==False:
            new_data.append(i)
    return new_data

old_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"04","name":"harry","age":21},
{"id":"05","name":"larry","age":66}
]

print(remove_duplicate(old_data))

output:

[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

Upvotes: 0

Samwise
Samwise

Reputation: 71454

Build a dict with the significant part of the dict as the key, then turn the values back into a list:

>>> old_data = [
... {"id":"01","name":"harry","age":21},
... {"id":"02","name":"barry","age":32},
... {"id":"03","name":"harry","age":44},
... {"id":"04","name":"harry","age":21},
... {"id":"05","name":"larry","age":66}
... 
>>> sorted({(d["name"], d["age"]): d for d in reversed(old_data)}.values(), key=lambda d: d["id"])
[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

If you don't care about which specific ids you keep or how they're sorted, it's simpler:

>>> list({(d["name"], d["age"]): d for d in old_data}.values())
[{'id': '04', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

Upvotes: 2

Related Questions