Reputation: 85
I'm trying to merge two JSON files into a single JSON using python.
File1:
{
"key1": "protocol1",
"key2": [
{
"name": "user.name",
"value": "[email protected]"
},
{
"name": "user.shortname",
"value": "user"
},
{
"name": "proxyuser.hosts",
"value": "*"
},
{
"name": "kb.groups",
"value": "hadoop,users,localusers"
},
{
"name": "proxy.groups",
"value": "group1, group2, group3"
},
{
"name": "internal.user.groups",
"value": "group1, group2"
}
]
}
File2:
{
"key1": "protocol1",
"key2": [
{
"name": "user.name",
"value": "[email protected]"
},
{
"name": "user.shortname",
"value": "user"
},
{
"name": "proxyuser.hosts",
"value": "*"
},
{
"name": "kb.groups",
"value": ""
},
{
"name": "proxy.groups",
"value": "group3, group4, group5"
},
{
"name": "internal.groups",
"value": "none"
}
]
}
Final expected result:
{
"key1": "protocol1",
"key2": [
{
"name": "user.name",
"value": "[email protected], [email protected]"
},
{
"name": "user.shortname",
"value": "user"
},
{
"name": "proxyuser.hosts",
"value": "*"
},
{
"name": "kb.groups",
"value": "hadoop,users,localusers"
},
{
"name": "proxy.groups",
"value": "group1, group2, group3, group4, group5"
},
{
"name": "internal.user.groups",
"value": "group1, group2"
},
{
"name": "internal.groups",
"value": "none"
}
]
}
I need to merge based on below rules:
If the 'name' key within the list(key2)
match in both the files then concatenate the values.
e.g.
File1:
"key2": [{"name" : "firstname", "value" : "bob"}]
File2:
"key2": [{"name" : "firstname", "value" : "charlie"}]
Final output:
"key2": [{"name" : "firstname", "value" : "bob, charlie"}]
Some considerations while appending the values:
If both files contain duplicate value(s) in 'value', final result should only be the union of the values.
If any of 'value' contains ' * ', then final value should be ' * '.
I've written a python script to load the two JSON files and merge them but it seems to just concatenate everything into the first JSON file.
def merge(a, b):
"merges b into a"
for key in b:
if key in a:# if key is in both a and b
if key == "key1":
pass
elif key == "key2":
for d1, d2 in zip(a[key], b[key]):
for key, value in d1.items():
if value != d2[key]:
a.append({"name": d2[key], "value": d2["value"]})
else:
a[key] = a[key]+ b[key]
else: # if the key is not in dict a , add it to dict a
a.update({key:b[key]})
return a
Can someone point out how I can compare the value for the "name" section with the list for key2 in both the files and concatenate the values in "value"?
Upvotes: 1
Views: 1589
Reputation: 56945
Here's a solution that runs in linear time using a dictionary to quickly look up an item in a
given a name
key. Dictionary b
's key2
list is iterated through once and a
modified in constant time as required. Sets are used to eliminate duplicates and handle asterisks.
def merge(a, b):
lookup = {o['name']: o for o in a['key2']}
for e in a['key2']:
e['value'] = set([x.strip() for x in e['value'].split(",")])
for e in b['key2']:
if e['name'] in lookup:
lookup[e['name']]['value'].update([x.strip() for x in e['value'].split(",")])
else:
e['value'] = set([x.strip() for x in e['value'].split(",")])
a['key2'].append(e)
for e in a['key2']:
if "*" in e['value']:
e['value'] = "*"
else:
e['value'] = ", ".join(sorted(list(e['value'])))
Sample output:
key1:
protocol1
key2:
{'name': 'user.name', 'value': '[email protected], [email protected]'}
{'name': 'user.shortname', 'value': 'user'}
{'name': 'proxyuser.hosts', 'value': '*'}
{'name': 'kb.groups', 'value': ', hadoop, localusers, users'}
{'name': 'proxy.groups', 'value': 'group1, group2, group3, group4, group5'}
{'name': 'internal.user.groups', 'value': 'group1, group2'}
{'name': 'internal.groups', 'value': 'none'}
Upvotes: 2
Reputation: 148910
Order of elements in a["key2"]
and b["key2"]
is not guaranteed to be the same, so you should build a mapping from the "name"
value to the index in a["key2"]
, and then browse b["key2"]
comparing each "name"
value to that dict.
Code could be:
def merge(a, b):
"merges b into a"
for key in b:
if key in a:# if key is in both a and b
if key == "key2":
# build a mapping from names from a[key2] to the member index
akey2 = { d["name"]: i for i,d in enumerate(a[key]) }
for d2 in b[key]: # browse b["key2"]
if d2["name"] in akey2: # a name from a["key2"] matches
a[key][akey2[d2["name"]]]["value"] += ", " + d2["value"]
else:
a[key].append(d2) # when no match
else: # if the key is not in dict a , add it to dict a
a[key] = b[key]
return a
You can then test it:
a = {"key1": "value1",
"key2": [{"name" : "firstname", "value" : "bob"}]
}
b = {"key1": "value2",
"key2": [{"name" : "firstname", "value" : "charlie"},
{"name" : "foo", "value": "bar"}]
}
merge(a, b)
pprint.pprint(a)
gives as expected:
{'key1': 'value1',
'key2': [{'name': 'firstname', 'value': 'bob, charlie'},
{'name': 'foo', 'value': 'bar'}]}
Upvotes: 1
Reputation: 4606
Just loop through the keys if its not in the new dict add it if it is merge the two values
d1 = {"name" : "firstname", "value" : "bob"}
d2 = {"name" : "firstname", "value" : "charlie"}
d3 = {}
for i in d1:
for j in d2:
if i not in d3:
d3[i] = d1[i]
else:
d3[i] = '{}, {}'.format(d1[i], d2[i])
print(d3)
(xenial)vash@localhost:~/python/stack_overflow$ python3.7 formats.py {'name': 'firstname, firstname', 'value': 'bob, charlie'}
Upvotes: 0