Reputation: 904
I'm trying to merge the object based on key specs
, most of the keys structure is consistent, taking into consideration the merge will only happen if company_name
is the same (in this example, I only have one company_name
) and if only (name, {color, type, license, description) are equal across multiple lists.
[
{
"company_name": "GreekNLC",
"metadata": [
{
"name": "Bob",
"details": [
{
"color": "black",
"type": "bmw",
"license": "4DFLK",
"specs": [
{
"properties": [
{
"info": [
"sedan",
"germany"
]
},
{
"info": [
"drive",
"expensive"
]
}
]
}
],
"description": "amazing car"
}
]
},
{
"name": "Bob",
"car_details": [
{
"color": "black",
"type": "bmw",
"license": "4DFLK",
"specs": [
{
"properties": [
{
"info": [
"powerful",
"convertable"
]
},
{
"info": [
"drive",
"expensive"
]
}
]
}
],
"description": "amazing car"
}
]
}
]
}
]
I expect the following output:
[
{
"company_name": "GreekNLC",
"metadata": [
{
"name": "Bob",
"details": [
{
"color": "black",
"type": "bmw",
"license": "4DFLK",
"specs": [
{
"properties": [
{
"info": [
"powerful",
"convertable"
]
},
{
"info": [
"sedan",
"germany"
]
},
{
"info": [
"drive",
"expensive"
]
}
]
}
],
"description": "amazing car"
}
]
}
]
}
]
Code I have so far,
headers = ['color', 'license', 'type', 'description']
def _key(d):
return [d.get(i) for i in headers]
def get_specs(b):
_specs = [c['properties'] for i in b for c in i['specs']]
return [{"properties": [i for b in _specs for i in b]}]
def merge(d):
new_merged_list = [[a, list(b)] for a, b in groupby(sorted(d, key=_key), key=_key)]
k = [{**dict(zip(headers, a)), 'specs': get_specs(b)} for a, b in new_merged_list]
return k
result = {'name': merge(c.get("details")) for i in data for c in i.get("metadata")}
print(json.dumps(result))
but it does not work. I'm getting this
{"name": [{"color": "black", "specs": [{"properties": [{"info":
["amazing", "strong"]}]}]}]}
Upvotes: 0
Views: 95
Reputation: 38962
The operation you're looking to perform is similar to a grouping by:
company_name
, name
, color
, type
, license
and description
.
You can make a tuple of all cars as key-value pairs and perform a set operation on the resulting tuple, group by the compound key and rebuild the list.
from collections import defaultdict
from collections.abc import Hashable
def merge_spec_props(company_data):
keyed_tuples = (
((
co['company_name'],
user['name'],
car_detail['color'],
car_detail['type'],
car_detail['license'],
car_detail['description'],
), (
(k, v
if isinstance(v, Hashable)
else tuple(v))
for k, v in prop.items()
)
)
for co in company_data
for user in co['metadata']
for car_detail in user['car_details']
for spec in car_detail['specs']
for prop in spec['properties']
for k, v in prop.items()
)
uniq = set(keyed_tuples)
grouped = defaultdict(list)
for k, spec in uniq:
grouped[k].append(spec)
merged_lst = [
{
'company_name': company_name,
'metadata': [{
'name': username,
'car_details': [{
'color': car_color,
'type': car_type,
'license': car_license,
'specs': [dict(spec)
for spec in specs
],
'description': desc
}]
}]
}
for (company_name, username, car_color, car_type, car_license, desc), specs in grouped.items()
]
return merged_lst
While this implementation is very specific to your data and possibly this function can as no reusable value for another kind of data.
If description
were different in any of car_details
, only the latest would be entered in a different company.
It's noteworthy to mention that this doesn't merge on intermediate fields. A possible way to go is to convert the data into a tree and do a postorder transversal to get the merged structure.
Upvotes: 1