Swap
Swap

Reputation: 307

Creating single JSON record from multiple JSON records (Performance efficient coding practice)

I have a dictionary with multiple JSON lines, as below.

my_dict = [{'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12234', 'some_other_keys': 'respective values.12234'}, {'userId': 'user1', 'processId': 'p1', 'reportName': 'report1', 'threadId': '12335', 'some_other_keys': 'respective values.12335', 'another_key': 'another_value.12335','key1': 'key1_value.12335'}, {'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12834', 'some_other_keys': 'respective values.12834','key2': 'key2_value.12834'}]

Note: different json lines have different set of keys.

In these lines 'processId': 'p1', 'userId': 'user1', 'reportName': 'report1' are same for all the lines and this is known to the programmer.

Objective:

  1. write a function to create a single JSON line out of the above.
  2. function arguments are
    1. list of matching keys i.e. ["processId","userId","reportName"]
    2. the dictionary as mentioned above.

Output:

The expect output for the above input dictionary is as below, a single JSON record.

{"processId": "p1", "userId": "user1", "reportName": "report1", "threadId_0": "12234", "some_other_keys_0": "respective values.12234", "threadId_1": "12335", "some_other_keys_1": "respective values.12335", "another_key_1": "another_value.12335","key1_1": "key1_value.12335", "threadId_2": "12834", "some_other_keys_2": "respective values.12834","key2_2": "key2_value.12834"}

My current code looks like below:

def multijson_to_singlejson_matchingkey(list_json, list_keys):
    rec0 = {}
    for l in range(len(list_keys)):
        key0 = list_keys[l]
        value0 = list_json[0][key0]
        rec0[f'{key0}'] = value0
    rec = {}
    for i in range(len(list_json)):
        line = list_json[i]
        for j in range(len(list_keys)):
            del line[list_keys[j]]
        line_keys = list(line)
        for k in range(len(line_keys)):
            key_a = line_keys[k] + "_" + f"{i}"
            line[f'{key_a}'] = line[f'{line_keys[k]}']
            del line[f'{line_keys[k]}']
        rec = {**rec, **line}
    res = {}
    res = {**rec0, **rec}
    print(res)
    return res

But this is a function with 20 lines of code. I'm trying to optimize the code with less number of lines of code and making it more performance efficient. Need help with the available options for doing that.

Upvotes: 1

Views: 68

Answers (1)

alani
alani

Reputation: 13079

You can simplify the generation of rec0 to a hopefully reasonably readable one-liner, and then loop over the list of input dictionaries to populate the rest, ignoring any keys that are in list_keys (although testing here equivalently against rec0 as it is marginally faster):

def multi_to_single(list_json, list_keys):
    rec0 = dict((key0, list_json[0][key0]) for key0 in list_keys)
    res = rec0.copy()
    for i, dct in enumerate(list_json):
        for k, v in dct.items():
            if k not in rec0:
                res[f'{k}_{i}'] = v
    print(res)
    return res

This gives (with pprint.pprint here instead of print for ease of reading):

{'another_key_1': 'another_value.12335',
 'key1_1': 'key1_value.12335',
 'key2_2': 'key2_value.12834',
 'processId': 'p1',
 'reportName': 'report1',
 'some_other_keys_0': 'respective values.12234',
 'some_other_keys_1': 'respective values.12335',
 'some_other_keys_2': 'respective values.12834',
 'threadId_0': '12234',
 'threadId_1': '12335',
 'threadId_2': '12834',
 'userId': 'user1'}

Upvotes: 1

Related Questions