Reputation: 13
I have a dictionary in python that has a dictionary within it and some values are placed in an array or list.
I want to print the dictionary to be all lowercase without changing anything else but the only code I have only iterates through values of a plain dict, it doesn't deal with arrays or lists. Any suggestions for how to fix this? Here's the code I have so far:
new_data = {}
for i in range (0, len(json_file)):
try:
data = json_file[i]['payload']
for key, value in data.iteritems():
new_data[value.lower()] = value
print (new_data)
except:
continue
And this is the nested dictionary:
{
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "[email protected]",
"existence_ml": 0.56942382176587,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": [
"Dunnottar Castle Aberdeenshire",
"Dunotter Castle"
],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
[
"Landmarks",
"Buildings and Structures"
]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.279160734645,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http:\/\/www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [
108
],
"country": "gb",
"_geocode_quality": "4"
},
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
}
Upvotes: 0
Views: 1026
Reputation: 365657
Nick A pointed out an interesting shortcut in the comments: Your dict looks like it's JSON-compatible. If so, can we just convert to JSON, lowercase that string, then convert back? There are a number of slightly different JSON standards: json.org, ECMA 404, and RFCs 4627, 7158, 7159, and 8259. And then there's the way JSON is actually used in practice. And the way it's implemented by Python's json
module. But I'll summarize here:
lowered = json.loads(json.dumps(d, ensure_ascii=False).lower())
… will work as long as:
dict
, list
, str
, float
, int
, bool
, and NoneType
.dict
values only have str
keys.list
and dict
values aren't circular (e.g., lst = []; lst.append(lst)
).float
values will never include math.inf
or math.nan
.𞤀
.int
values outside range(-(2**53)+1, (2**53))
.Notice the ensure_ascii=False
. This is necessary if you might have any non-ASCII letters, because 'É'.lower()
is 'é'
, but r'\u00c9'.lower()
does nothing.
For JSON that you receive over the wire or in a file, instead of creating it yourself with dumps
, of course you can't trust strings to not be escaped. But you can always loads
it first, then dumps
it to lowercase and loads
again. (For this case, you might want to add allow_nan=False
to catch inf
and nan
values early, where they're easier to debug.)
Using the third-party library simplejson
(which the stdlib json
is based on) will probably eliminate the requirements for recent Python, and may provide workarounds for some other possible issues, but I haven't tried it.
If this hack isn't acceptable for whatever reason, the cleaner way to do it is to recurse through the structure. A simple version looks like this:
def recursive_lower(obj):
if isinstance(obj, str):
return obj.lower()
elif isinstance(obj, list):
return [recursive_lower(elem) for elem in obj]
elif isinstance(obj, dict):
return {key.lower(): recursive_lower(value) for key, value in obj.items()}
else:
return obj
Of course the reason you're not using JSON is presumably that your types don't all map directly to JSON, which means they probably won't work with the above. But you can easily extend it as needed. For example:
To handle non-string keys, replace the dict
clause:
elif isinstance(obj, dict):
return {recursive_lower(key): recursive_lower(value) for key, value in obj.items()}
To handle tuple
s and other sequences that aren't list
s, replace the list
clause (make sure this comes after the str
check, because str
is a Sequence
type…):
elif isinstance(obj, collections.abc.Sequence):
return type(obj)(map(recursive_lower, obj))
To handle bytes
(as pure ASCII strings), change the str
part:
if isinstance(obj, (str, bytes)):
return obj.lower()
To duck-type anything with a lower
method (str
, bytes
, bytearray
, various third-party types), change the str
part:
try:
return obj.lower()
except AttributeError:
pass
And so on.
Upvotes: 1