user30985
user30985

Reputation: 683

Remove the duplicate value in a nested dictionary

I have a nested dictionary d1

d1={'Hiraki': {'Hiraki_2': ['KANG_785','KANG_785','KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_785', 'KANG_785', 'KANG_751']}}

I would like to remove the duplicate values for each key. The result after removing the duplicate values should be:

d1={'Hiraki': {'Hiraki_2': ['KANG_785','KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_751']}}

I do not how to code it in python. Please help me.

Upvotes: 0

Views: 99

Answers (5)

leopardxpreload
leopardxpreload

Reputation: 768

Here is a recursive solution:

This will change the lists inplace

d1={'Hiraki': {'Hiraki_2': ['KANG_785','KANG_785','KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_785', 'KANG_785', 'KANG_751']}}

# Deals with the tuples
def recurse_tuple(my_tup):
    for i, v in enumerate(my_tup):
        if isinstance(v, dict): my_tup[i] = recurse_dict(v)
    return my_tup

# Deals with the dictionaries and lists
def recurse_dict(my_dict):
    for k, v in my_dict.items():
        if isinstance(v, dict): my_dict[k] = recurse_dict(v)
        if isinstance(v, tuple): my_dict[k] = recurse_tuple(v)
        if isinstance(v, list): my_dict[k] = set(v)
    return my_dict

print(recurse_dict(d1))

#Output
{'Hiraki': {'Hiraki_2': {'KANG_762', 'KANG_785'}}, 'LakeTaupo': {'LakeTaupo_2': {'KANG_785', 'KANG_751'}}}

NOTE: @Samwise has beaten me to the punch with a very neat recursive function.

Upvotes: 1

LevB
LevB

Reputation: 953

You can use set() to eliminate duplicates.

d1={'Hiraki': {'Hiraki_2': ['KANG_785','KANG_785','KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_785', 'KANG_785', 'KANG_751']}}

d2 ={key1: {key2: list(set(val2)) for key2, val2 in val1.items()} for key1, val1 in 
d1.items()}

print(d2)

Output:

{'Hiraki': {'Hiraki_2': ['KANG_785', 'KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_751']}}

Upvotes: 1

pepoluan
pepoluan

Reputation: 6780

Basically, if you want to remove duplicate values in a sequence, you convert it to a set then back again.

>>> data = ['KANG_785','KANG_785','KANG_762']
>>> data = list(set(data))
>>> data
['KANG_762', 'KANG_785']

Notice that this will not maintain ordering.

Also, consider carefully if you actually need a list or not; a set is still iterable after all, so if you want to maintain uniqueness at all time, consider storing the data as a set and convert to list only when necessary.

>>> data = ['KANG_785','KANG_785','KANG_762']
>>> data = set(data)
>>> data
{'KANG_762', 'KANG_785'}
>>> for i in data:
...     print(i)
...     
KANG_762
KANG_785
>>> type(data)
<class 'set'>

Upvotes: 1

Harsha Biyani
Harsha Biyani

Reputation: 7268

You can try:

d1={'Hiraki': {'Hiraki_2': ['KANG_785','KANG_785','KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_785', 'KANG_785', 'KANG_751']}}

output = {}
for key, val in d1.items():
    for key1, val1 in val.items():
        output[key] = {
            key1: list(set(val1))
        }
print(output)

Output:

{'Hiraki': {'Hiraki_2': ['KANG_785', 'KANG_762']}, 'LakeTaupo': {'LakeTaupo_2': ['KANG_785', 'KANG_751']}}

Upvotes: 1

Samwise
Samwise

Reputation: 71454

You can use the same strategy as described in this answer:

Convert a mixed nested dictionary into a list

but for the case where isinstance(d, list), return list(set(d)) (which will remove duplicate entries) instead of d.

E.g.:

def dedupe_lists(d: dict) -> dict:
    if isinstance(d, list):
        return list(set(d))
    if isinstance(d, dict):
        return {k: dedupe_lists(v) for k, v in d.items()}
    return d

Upvotes: 2

Related Questions