gaucho_1789
gaucho_1789

Reputation: 13

Convert values of a dictionary to lowercase

I have a dictionary in python that has a dictionary within it and some values are placed in an array or list.

I want to print the dictionary to be all lowercase without changing anything else but the only code I have only iterates through values of a plain dict, it doesn't deal with arrays or lists. Any suggestions for how to fix this? Here's the code I have so far:

new_data = {}
for i in range (0, len(json_file)):
    try: 
        data = json_file[i]['payload']
        for key, value in data.iteritems():
            new_data[value.lower()] = value
            print (new_data)
    except:
        continue

And this is the nested dictionary:

{
  "payload": {
    "existence_full": 1,
    "geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
    "latitude": "56.945972",
    "locality": "Stonehaven",
    "_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
    "address": "The Lodge, Dunottar",
    "email": "[email protected]",
    "existence_ml": 0.56942382176587,
    "domain_aggregate": "",
    "name": "Dunnottar Castle",
    "search_tags": [
      "Dunnottar Castle Aberdeenshire",
      "Dunotter Castle"
    ],
    "admin_region": "Scotland",
    "existence": 1,
    "category_labels": [
      [
        "Landmarks",
        "Buildings and Structures"
      ]
    ],
    "post_town": "Stonehaven",
    "region": "Kincardineshire",
    "review_count": "719",
    "geocode_level": "within_50m",
    "tel": "01569 762173",
    "placerank": 65,
    "longitude": "-2.197123",
    "placerank_ml": 37.279160734645,
    "fax": "01330 860325",
    "category_ids_text_search": "",
    "website": "http:\/\/www.dunnottarcastle.co.uk",
    "status": "1",
    "geocode_confidence": "20",
    "postcode": "AB39 2TL",
    "category_ids": [
      108
    ],
    "country": "gb",
    "_geocode_quality": "4"
  },
  "uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
}

Upvotes: 0

Views: 1026

Answers (1)

abarnert
abarnert

Reputation: 365657

Nick A pointed out an interesting shortcut in the comments: Your dict looks like it's JSON-compatible. If so, can we just convert to JSON, lowercase that string, then convert back? There are a number of slightly different JSON standards: json.org, ECMA 404, and RFCs 4627, 7158, 7159, and 8259. And then there's the way JSON is actually used in practice. And the way it's implemented by Python's json module. But I'll summarize here:

lowered = json.loads(json.dumps(d, ensure_ascii=False).lower())

… will work as long as:

  • Your values are all of type dict, list, str, float, int, bool, and NoneType.
  • Your dict values only have str keys.
  • Your list and dict values aren't circular (e.g., lst = []; lst.append(lst)).
  • Your float values will never include math.inf or math.nan.
  • You're using a recent-ish Python (3.6 is fine), or will never have any non-BMP letters like 𞤀.
  • You're using a recent-ish Python (3.6 is fine), or will never have any int values outside range(-(2**53)+1, (2**53)).

Notice the ensure_ascii=False. This is necessary if you might have any non-ASCII letters, because 'É'.lower() is 'é', but r'\u00c9'.lower() does nothing.

For JSON that you receive over the wire or in a file, instead of creating it yourself with dumps, of course you can't trust strings to not be escaped. But you can always loads it first, then dumps it to lowercase and loads again. (For this case, you might want to add allow_nan=False to catch inf and nan values early, where they're easier to debug.)

Using the third-party library simplejson (which the stdlib json is based on) will probably eliminate the requirements for recent Python, and may provide workarounds for some other possible issues, but I haven't tried it.


If this hack isn't acceptable for whatever reason, the cleaner way to do it is to recurse through the structure. A simple version looks like this:

def recursive_lower(obj):
    if isinstance(obj, str):
        return obj.lower()
    elif isinstance(obj, list):
        return [recursive_lower(elem) for elem in obj]
    elif isinstance(obj, dict):
        return {key.lower(): recursive_lower(value) for key, value in obj.items()}
    else:
        return obj

Of course the reason you're not using JSON is presumably that your types don't all map directly to JSON, which means they probably won't work with the above. But you can easily extend it as needed. For example:

To handle non-string keys, replace the dict clause:

    elif isinstance(obj, dict):
        return {recursive_lower(key): recursive_lower(value) for key, value in obj.items()}

To handle tuples and other sequences that aren't lists, replace the list clause (make sure this comes after the str check, because str is a Sequence type…):

    elif isinstance(obj, collections.abc.Sequence):
        return type(obj)(map(recursive_lower, obj))

To handle bytes (as pure ASCII strings), change the str part:

    if isinstance(obj, (str, bytes)):
        return obj.lower()

To duck-type anything with a lower method (str, bytes, bytearray, various third-party types), change the str part:

    try:
        return obj.lower()
    except AttributeError:
        pass

And so on.

Upvotes: 1

Related Questions