JohnnyCc
JohnnyCc

Reputation: 525

PYTHON grouping dictionary in array

I'm trying to summarize and grouping multiple dictionary into single dictionary in array based on the dictionary's value. Also, get the count of the grouping value.

For example in the raw input format of 'sms', i intend to group by subscribe_name ,endpoint ,errorId and get its count.

I have no much idea where to start and hope some guidelines from here and library resource can be utilised.

Desired payload format:

{
    "myDeviceX": {
        "channel": {
            "sms": [
                {
                    "endpoint": "+123456789",
                    "errorId": ["1","2","3","4"],
                    "error_num": 4,
                    "subscriber_name": "tester1"
                },
                {
                    "endpoint": "+234567890",
                    "errorId": ["1"],
                    "error_num": 1,
                    "subscriber_name": "tester2"
                }
            ],
            "email": [
                {
                    "endpoint": "[email protected]",
                    "errorId": ["1","2","3"],
                    "error_num": 3,
                    "subscriber_name": "tester1"
                }
            ]
        }
    }
}

Raw input payload format:

{
    "myDeviceX": {
        "sms": [
            {
                "endpoint": "+123456789",
                "errorId": "1",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "+123456789",
                "errorId": "2",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "+123456789",
                "errorId": "3",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "+123456789",
                "errorId": "4",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "+234567890",
                "errorId": "1",
                "subscriber_name": "tester2"
            }
        ],
        "email": [
            {
                "endpoint": "[email protected]",
                "errorId": "1",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "[email protected]",
                "errorId": "2",
                "subscriber_name": "tester1"
            },
            {
                "endpoint": "[email protected]",
                "errorId": "3",
                "subscriber_name": "tester1"
            }
        ]
    }
}

Upvotes: 0

Views: 92

Answers (1)

jeremye
jeremye

Reputation: 1388

For educational purposes, I'm going to present two different solutions, first the most straightforward and then a "pythonic" approach (which is not necessarily better at all).

First let's have our initial input (given in the question stored in a variable initial_data. Then 1) for each device, create a new object for that device 2) for each channel in that device, create a new list for that channel, and 3) group all of the items in that channel by endpoint and subscriber name and add a new object for that endpoint to the list we created for the channel.

import itertools

output = {}

# Look at each device and its channels
for device, channels in initial_data.items():
    output[device] = {'channel': {}}  # create new object for the device

    # For each channel, we can process its items by endpoints and subscribers
    for channel, entries in channels.items():
        output[device]['channel'][channel] = []  # create a new list for each channel

        for k, g in itertools.groupby(entries, key=lambda x: (x['endpoint'], x['subscriber_name'])):  # groups entries by a endpoint-subscriber_name pair
            output[device]['channels'][channel].append({
                'endpoint': k[0],  # the endpoint
                'subscriber_name': k[1],  # the subscriber name
                'error_num': len(list(g)),
                'error_id': [x['errorId'] for x in list(g)]
            })

# Output is now in the desired format!

and we are done!

The following is a "pythonic" approach using, perhaps too many, dict and list comprehensions, for demonstrative purposes if nothing else:

output = {
    device: {'channels': {
        channel: [
            {
                'endpoint': k[0],
                'subscriber_name': k[1],
                'error_num': len(list(g)),
                'error_id': [x['errorId'] for x in list(g)]
            }
            for k, g in itertools.groupby(entries, key=lambda x: (x['endpoint'], x['subscriber_name']))
        ]
        for channel, entries in channels.items()
    }}
    for device, channels in initial_data.items()
}

This approach essentially just flips all the loops. You might find that all these nested comprehensions is a bit unwieldy, but maybe the best solution lies somewhere between the two.

Upvotes: 2

Related Questions