Removing duplicates in nested dictionaries

Question

I am trying to remove all the duplicates and only make the original value remain in a nested dictionary. This is my nested dictionary below. In this dictionary, I am trying to check if two names are the same and if they are same, then remove the second and subsequent duplicates. For example: dictionaries 4 and 5 have the same name 'Sasha', so dictionary 5 should be removed.

    dict1 = {
1: {'friends': [2],
     'history': [],
     'id': 1,
     'name': 'Fred',
     'date_of_birth': datetime.date(2022, 2, 1)},
 2: {'friends': [1],
     'history': [],
     'id': 2,
     'name': 'Jenny',
     'date_of_birth': datetime.date(2004, 11, 18)},
 3: {'friends': [4],
     'history': [],
     'id': 3,
     'name': 'Jiang',
     'date_of_birth': datetime.date(1942, 9, 16)},
 4: {'friends': [3],
     'history': [],
     'id': 4,
     'name': 'Sasha',
     'date_of_birth': datetime.date(1834, 2, 2)},
 5: {'friends': [6],
     'history': [],
     'id': 5,
     'name': 'Sasha',
     'date_of_birth': datetime.date(1834, 2, 2)},
 6: {'friends': [5],
     'history': [],
     'id': 6,
     'name': 'Amir',
     'date_of_birth': datetime.date(1981, 8, 11)}}

I have implemented my solution like this but I don't understand where I am going wrong.

temp = []
res = dict()
for key, val in dict1.items():
    if val not in temp:
        temp.append(val)
        res[key] = val

print(pprint.pformat(res))

It would be great if someone could help me with this.

user7864386 · Accepted Answer

In your for-loop, val is the inner dictionary and if you look closely, the value of 4 and 5 are different ("friends" and "id" are different), so it's not dropped. However, since you only need the "name" to be the same (not the entire dictionary), you can keep track of the "name" instead and keep only unique names:

temp = []
res = dict()
for key, val in dict1.items():
    if val['name'] not in temp:
        temp.append(val['name'])
        res[key] = val

Edit:

If the goal is to "shift" keys as well, you could approach it a little differently by only storing the non-duplicate values in res, then zip it with the keys of dict1 to create the output dictionary:

temp = set()
res = []
for val in dict1.values():
    if val['name'] not in temp:
        temp.add(val['name'])
        res.append(val)
out = dict(zip(dict1, res))

Output:

{1: {'friends': [2],
  'history': [],
  'id': 1,
  'name': 'Fred',
  'date_of_birth': datetime.date(2022, 2, 1)},
 2: {'friends': [1],
  'history': [],
  'id': 2,
  'name': 'Jenny',
  'date_of_birth': datetime.date(2004, 11, 18)},
 3: {'friends': [4],
  'history': [],
  'id': 3,
  'name': 'Jiang',
  'date_of_birth': datetime.date(1942, 9, 16)},
 4: {'friends': [3],
  'history': [],
  'id': 4,
  'name': 'Sasha',
  'date_of_birth': datetime.date(1834, 2, 2)},
 5: {'friends': [5],
  'history': [],
  'id': 6,
  'name': 'Amir',
  'date_of_birth': datetime.date(1981, 8, 11)}}

Removing duplicates in nested dictionaries

Answers (1)

Related Questions