MaverickD
MaverickD

Reputation: 1657

load duplicate keys from nested json file as dictionary of list

I have a json file in this format,

{
  "details": {

    "hawk_branch": {
      "tandem": {
        "value": "4210bnd72"
      }
    },
    "uclif_branch": {
      "tandem": {
        "value": "e2nc712nma89",
        "value": "23s24212",
        "value": "12338cm82",
      }
    }
    }
}

The problem is, I need to keep all the value, however when i use json.load to load this file i only get one value, which make sense since dict can keep only unique keys.

Here is the expected output,

{ "hawk_branch": ["4210bnd72"] }
{ "uclif_branch": ["e2nc712nma89" , "23s24212", "12338cm82"] }

I have read this answer, Python json parser allow duplicate keys to use object_pairs_hook like this,

def parse_object_pairs(pairs):
    return pairs

# f is file
json.load(f, object_pairs_hook=parse_object_pairs)

but it returns entire json file as list.

I think its possible to do it using lambda as object_pairs_hook but i can't understand how can I use it.

Can someone please guide me

Upvotes: 2

Views: 1724

Answers (1)

blhsing
blhsing

Reputation: 106638

You can use a custom duplicate key resolver function that turns the values of the value keys into a list:

def value_resolver(pairs):
    if all(k == 'value' for k, _ in pairs):
        return [v for _, v in pairs]
    return dict(pairs)

so that:

json.load(f, object_pairs_hook=value_resolver)

returns:

{'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}

And to dump the new data structure back to the original JSON format by converting lists to dicts with duplicate value keys, you can use a custom json.JSONEncoder subclass:

class restore_value(json.JSONEncoder):
    def encode(self, o):
        if isinstance(o, dict):
            return '{%s}' % ', '.join(': '.join((json.encoder.py_encode_basestring(k), self.encode(v))) for k, v in o.items())
        if isinstance(o, list):
            return '{%s}' % ', '.join('"value": %s' % self.encode(v) for v in o)
        return super().encode(o)

so that:

d = {'details': {'hawk_branch': {'tandem': ['4210bnd72']}, 'uclif_branch': {'tandem': ['e2nc712nma89', '23s24212', '12338cm82']}}}
print(json.dumps(d, cls=restore_value))

would output:

{"details": {"hawk_branch": {"tandem": {"value": "4210bnd72"}}, "uclif_branch": {"tandem": {"value": "e2nc712nma89", "value": "23s24212", "value": "12338cm82"}}}}

Upvotes: 4

Related Questions