Reputation: 307
I have a dictionary with multiple JSON lines, as below.
my_dict = [{'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12234', 'some_other_keys': 'respective values.12234'}, {'userId': 'user1', 'processId': 'p1', 'reportName': 'report1', 'threadId': '12335', 'some_other_keys': 'respective values.12335', 'another_key': 'another_value.12335','key1': 'key1_value.12335'}, {'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12834', 'some_other_keys': 'respective values.12834','key2': 'key2_value.12834'}]
Note: different json lines have different set of keys.
In these lines 'processId': 'p1', 'userId': 'user1', 'reportName': 'report1' are same for all the lines and this is known to the programmer.
Objective:
["processId","userId","reportName"]
Output:
The expect output for the above input dictionary is as below, a single JSON record.
{"processId": "p1", "userId": "user1", "reportName": "report1", "threadId_0": "12234", "some_other_keys_0": "respective values.12234", "threadId_1": "12335", "some_other_keys_1": "respective values.12335", "another_key_1": "another_value.12335","key1_1": "key1_value.12335", "threadId_2": "12834", "some_other_keys_2": "respective values.12834","key2_2": "key2_value.12834"}
My current code looks like below:
def multijson_to_singlejson_matchingkey(list_json, list_keys):
rec0 = {}
for l in range(len(list_keys)):
key0 = list_keys[l]
value0 = list_json[0][key0]
rec0[f'{key0}'] = value0
rec = {}
for i in range(len(list_json)):
line = list_json[i]
for j in range(len(list_keys)):
del line[list_keys[j]]
line_keys = list(line)
for k in range(len(line_keys)):
key_a = line_keys[k] + "_" + f"{i}"
line[f'{key_a}'] = line[f'{line_keys[k]}']
del line[f'{line_keys[k]}']
rec = {**rec, **line}
res = {}
res = {**rec0, **rec}
print(res)
return res
But this is a function with 20 lines of code. I'm trying to optimize the code with less number of lines of code and making it more performance efficient. Need help with the available options for doing that.
Upvotes: 1
Views: 68
Reputation: 13079
You can simplify the generation of rec0
to a hopefully reasonably readable one-liner, and then loop over the list of input dictionaries to populate the rest, ignoring any keys that are in list_keys
(although testing here equivalently against rec0
as it is marginally faster):
def multi_to_single(list_json, list_keys):
rec0 = dict((key0, list_json[0][key0]) for key0 in list_keys)
res = rec0.copy()
for i, dct in enumerate(list_json):
for k, v in dct.items():
if k not in rec0:
res[f'{k}_{i}'] = v
print(res)
return res
This gives (with pprint.pprint
here instead of print
for ease of reading):
{'another_key_1': 'another_value.12335',
'key1_1': 'key1_value.12335',
'key2_2': 'key2_value.12834',
'processId': 'p1',
'reportName': 'report1',
'some_other_keys_0': 'respective values.12234',
'some_other_keys_1': 'respective values.12335',
'some_other_keys_2': 'respective values.12834',
'threadId_0': '12234',
'threadId_1': '12335',
'threadId_2': '12834',
'userId': 'user1'}
Upvotes: 1