Reputation: 83
I am trying to clean a json object by removing keys if their value is 'N/A', '-', or '' and likewise removing any of these values from any lists. Example of object to be cleaned:
dirty = {
'name': {'first': 'Robert', 'middle': '', 'last': 'Smith'},
'age': 25,
'DOB': '-',
'hobbies': ['running', 'coding', '-'],
'education': {'highschool': 'N/A', 'college': 'Yale'}
}
I found a similar problem and modified the solution, giving this function:
def clean_data(value):
"""
Recursively remove all values of 'N/A', '-', and ''
from dictionaries and lists, and return
the result as a new dictionary or list.
"""
missing_indicators = set(['N/A', '-', ''])
if isinstance(value, list):
return [clean_data(x) for x in value if x not in missing_indicators]
elif isinstance(value, dict):
return {
key: clean_data(val)
for key, val in value.items()
if val not in missing_indicators
}
else:
return value
But I get the unhashable type: dict error from the dictionary comprehension:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-d42b5f1acaff> in <module>
----> 1 clean_data(dirty)
<ipython-input-72-dde33dbf1804> in clean_data(value)
11 return {
12 key: clean_data(val)
---> 13 for key, val in value.items()
14 if val not in missing_indicators
15 }
<ipython-input-72-dde33dbf1804> in <dictcomp>(.0)
12 key: clean_data(val)
13 for key, val in value.items()
---> 14 if val not in missing_indicators
15 }
16 else:
TypeError: unhashable type: 'dict'
Obviously something about the way I do the set comparison doesn't work the way I think it should when val is a dict. Can anyone enlighten me?
Upvotes: 0
Views: 947
Reputation: 7835
At first glance, this looks like a problem:
if val not in missing_indicators
When you use in
on a set
, it will check if the value you're asking about is among the set
entries. To be a key in a dict
or a member of a set
in Python, the value you're using must be hashable. You can check if a value in Python is hashable by running hash
on it:
>>> hash(1)
1
>>> hash("hello")
7917781502247088526
>>> hash({"1":"2"})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
In your snippet, it looks like val
is a dict
and you are asking Python if this val
is one of the values present in the set
. In response, Python attempts to hash
val
, but this fails.
The hurdle you have to overcome is that some of the values in your outer dict
are themselves a dict
, whereas other values look like list
, str
or int
. You will need different strategies in each case: check what type of thing val
is and then act accordingly.
Upvotes: 1