tkcode
tkcode

Reputation: 85

Merge two json files containing dict & list into single json using python?

I'm trying to merge two JSON files into a single JSON using python.

File1:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": "hadoop,users,localusers"
            },        
            {
                    "name": "proxy.groups",
                    "value": "group1, group2, group3"
            },
            {
                    "name": "internal.user.groups",
                    "value": "group1, group2"
            }
    ]
}

File2:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": ""
            },        
            {
                    "name": "proxy.groups",
                    "value": "group3, group4, group5"
            },
            {
                    "name": "internal.groups",
                    "value": "none"
            }
    ]
}

Final expected result:

{
    "key1":    "protocol1",
    "key2":     [
            {
                    "name": "user.name",
                    "value": "[email protected], [email protected]"
            },
            {
                    "name": "user.shortname",
                    "value": "user"
            },
            {
                    "name": "proxyuser.hosts",
                    "value": "*"
            },
            {
                    "name": "kb.groups",
                    "value": "hadoop,users,localusers"
            },        
            {
                    "name": "proxy.groups",
                    "value": "group1, group2, group3, group4, group5"
            },
            {
                    "name": "internal.user.groups",
                    "value": "group1, group2"
            },
            {
                    "name": "internal.groups",
                    "value": "none"
            }
    ]
}

I need to merge based on below rules:

  1. If the 'name' key within the list(key2) match in both the files then concatenate the values.

    e.g.

    File1:

    "key2": [{"name" : "firstname", "value" : "bob"}]
    

    File2:

    "key2": [{"name" : "firstname", "value" : "charlie"}]
    

    Final output:

    "key2": [{"name" : "firstname", "value" : "bob, charlie"}]
    

Some considerations while appending the values:

I've written a python script to load the two JSON files and merge them but it seems to just concatenate everything into the first JSON file.

    def merge(a, b):
        "merges b into a"
        for key in b:
            if key in a:# if key is in both a and b
                if key == "key1":
                    pass
                elif key == "key2":
                    for d1, d2 in zip(a[key], b[key]):
                        for key, value in d1.items():
                            if value != d2[key]:
                                a.append({"name": d2[key], "value": d2["value"]})
                else:
                  a[key] = a[key]+ b[key]
            else: # if the key is not in dict a , add it to dict a
                a.update({key:b[key]})
        return a

Can someone point out how I can compare the value for the "name" section with the list for key2 in both the files and concatenate the values in "value"?

Upvotes: 1

Views: 1589

Answers (3)

ggorlen
ggorlen

Reputation: 56945

Here's a solution that runs in linear time using a dictionary to quickly look up an item in a given a name key. Dictionary b's key2 list is iterated through once and a modified in constant time as required. Sets are used to eliminate duplicates and handle asterisks.

def merge(a, b):
    lookup = {o['name']: o for o in a['key2']}

    for e in a['key2']:
        e['value'] = set([x.strip() for x in e['value'].split(",")])

    for e in b['key2']:
        if e['name'] in lookup:
            lookup[e['name']]['value'].update([x.strip() for x in e['value'].split(",")])
        else:
            e['value'] = set([x.strip() for x in e['value'].split(",")])
            a['key2'].append(e)

    for e in a['key2']:
        if "*" in e['value']:
            e['value'] = "*"
        else:
            e['value'] = ", ".join(sorted(list(e['value'])))

Sample output:

key1:
    protocol1
key2:
    {'name': 'user.name', 'value': '[email protected], [email protected]'}
    {'name': 'user.shortname', 'value': 'user'}
    {'name': 'proxyuser.hosts', 'value': '*'}
    {'name': 'kb.groups', 'value': ', hadoop, localusers, users'}
    {'name': 'proxy.groups', 'value': 'group1, group2, group3, group4, group5'}
    {'name': 'internal.user.groups', 'value': 'group1, group2'}
    {'name': 'internal.groups', 'value': 'none'}

Upvotes: 2

Serge Ballesta
Serge Ballesta

Reputation: 148910

Order of elements in a["key2"] and b["key2"] is not guaranteed to be the same, so you should build a mapping from the "name" value to the index in a["key2"], and then browse b["key2"] comparing each "name" value to that dict.

Code could be:

def merge(a, b):
    "merges b into a"
    for key in b:
        if key in a:# if key is in both a and b
            if key == "key2":
                # build a mapping from names from a[key2] to the member index
                akey2 = { d["name"]: i for i,d in enumerate(a[key]) }
                for d2 in b[key]:      # browse b["key2"]
                    if d2["name"] in akey2:   # a name from a["key2"] matches
                        a[key][akey2[d2["name"]]]["value"] += ", " + d2["value"]
                    else:
                        a[key].append(d2)     # when no match
        else: # if the key is not in dict a , add it to dict a
            a[key] = b[key]
    return a

You can then test it:

a = {"key1":    "value1",
     "key2": [{"name" : "firstname", "value" : "bob"}]
     }
b = {"key1":    "value2",
     "key2": [{"name" : "firstname", "value" : "charlie"},
          {"name" : "foo", "value": "bar"}]
     }
merge(a, b)

pprint.pprint(a)

gives as expected:

{'key1': 'value1',
 'key2': [{'name': 'firstname', 'value': 'bob, charlie'},
          {'name': 'foo', 'value': 'bar'}]}

Upvotes: 1

vash_the_stampede
vash_the_stampede

Reputation: 4606

Just loop through the keys if its not in the new dict add it if it is merge the two values

d1 = {"name" : "firstname", "value" : "bob"}
d2 = {"name" : "firstname", "value" : "charlie"}
d3 = {}

for i in d1:
    for j in d2:
        if i not in d3:
            d3[i] = d1[i]
        else:
            d3[i] = '{}, {}'.format(d1[i], d2[i])

print(d3)
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 formats.py 
{'name': 'firstname, firstname', 'value': 'bob, charlie'}

Upvotes: 0

Related Questions