Reputation: 839
I have a nested dictionary and I'm trying to find duplicates within in. For example, if I have:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
The return value would be something like:
True
because this dictionary contains duplicates.
I was able to do this quite easily with a regular dictionary, and I thought this would work well with this case too:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print(rev_dictionary)
for key,values in dictionary.items():
if len(values) > 1:
values = True
else:
values = False
which throws the following error:
TypeError: unhashable type: 'dict'
How can I get this working?
Thanks for the help!
Note: I'd prefer a solution without using libraries if possible
Upvotes: 1
Views: 2307
Reputation: 1773
Convert the nested dictionary into nested lists of their values:
def nested_values(v):
return map(nested_values, v.values()) if isinstance(v, dict) else v
Then flatten the nested lists into one list of all values in the dictionaries, and then check the flattened list of values for duplicates:
from itertools import chain
def is_duplicated_value(d):
flat = list(chain.from_iterable(nested_values(d)))
return len(flat) != len(set(flat))
Test:
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_duplicated_value( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False
Depending on your use and size of dictionaries etc you may want to recast these steps into a recursive function that adds each value to a set
checking if each value is in the set before adding and returning True
immediately or False
if the dictionary is exhausted.
class Duplicated(ValueError): pass
def is_dup(d):
values = set()
def add(v):
if isinstance(v, dict):
map(add, v.values())
else:
if v in values:
raise Duplicated
else:
values.add(v)
try:
add(d)
return False
except Duplicated:
return True
Test:
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'a'}} )
print is_dup( {1:'a', 2:'b', 3:{1:'c', 2:'d'}} )
Outputs:
True
False
Upvotes: 0
Reputation: 4199
I am assuming you are defining duplicates by value and not by keys. In that case, you can flatten the nested dict using (mentioned here)
def flatten(d):
out = {}
for key, val in d.items():
if isinstance(val, dict):
val = [val]
if isinstance(val, list):
for subdict in val:
deeper = flatten(subdict).items()
out.update({key + '_' + key2: val2 for key2, val2 in deeper})
else:
out[key] = val
return out
and then check for the condition
v = flatten(d).values()
len(set(v))!=len(v)
results in True
Upvotes: 2
Reputation: 907
I wrote a simple solution:
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
def get_dups(a, values=None):
if values is None: values = []
if (a in values): return True
values.append(a)
if type(a) == dict:
for i in a.values():
if (get_dups(i, values=values)):
return True
return False
print(get_dups(dictionary))
We start by saving every value
in a list, which we will pass into the function.
Each run we check whether our current value is in that list, and return True
once there is a duplicate.
if (a in values): return True
Next we just loop through the values and run get_dups
on them if the current index is also a dictionary.
Upvotes: 1
Reputation: 2701
I think all you need is to flatten the dict before passing to your duplication-detection pipeline:
import pandas as pd
def flatten_dict(d):
df = pd.io.json.json_normalize(d, sep='_')
return df.to_dict(orient='records')[0]
dictionary = {'hello': 3 , 'world':{'this': 5 , 'is':{'a': 3, 'dict': None}}}
dictionary = flatten_dict(dictionary)
print('flattend')
print(dictionary)
rev_dictionary = {}
for key, value in dictionary.items():
rev_dictionary.setdefault(value, set()).add(key)
print('reversed')
print(rev_dictionary)
is_duplicate = False
for key, values in rev_dictionary.items():
if len(values) > 1:
is_duplicate = True
break
print('is duplicate?', is_duplicate)
The result:
flattend
{'hello': 3, 'world_is_a': 3, 'world_is_dict': None, 'world_this': 5}
reversed
{3: {'hello', 'world_is_a'}, None: {'world_is_dict'}, 5: {'world_this'}}
is duplicate? True
Code for flattening a dict borrowed from: Flatten nested Python dictionaries, compressing keys.
Upvotes: 0
Reputation: 107134
You can recursively add item values of sub-dicts to a set, and if any item value is already "seen" in the set, raise an exception so that the wrapper can return True
to indicate that a dupe is found:
def has_dupes(d):
def values(d):
seen = set()
for k, v in d.items():
if isinstance(v, dict):
s = values(v)
if seen & s:
raise RuntimeError()
seen.update(s)
else:
if v in seen:
raise RuntimeError()
seen.add(v)
return seen
try:
values(d)
except RuntimeError:
return True
return False
so that given your sample input, has_dupes(dictionary)
would return: True
Upvotes: 0