Johnny John Boy
Johnny John Boy

Reputation: 3222

How to remove duplicate values from a list of Python dictionaries where order is preserved?

I am trying to turn a list of max 30 Python dictionaries with duplicate values into a summarised list. A further complication is that the order of the list is by date/time oldest on top and I need the summarised list to be the newest occurrence of the dictionary.

data = [
    {
        "client": {
            "id": "12345"
        },
        "name": "John",
        "date": "18-10-2021 12:31:08"
    },
    {
        "client": {
            "id": "12345"
        },
        "name": "John",
        "date": "18-10-2021 12:31:19"
    },
    {
        "client": {
            "id": "12345"
        },
        "name": "John",
        "date": "18-10-2021 12:31:25"
    },
    {
        "client": {
            "id": "23456"
        },
        "name": "Simon",
        "date": "18-10-2021 12:32:48"
    },
    {
        "client": {
            "id": "23456"
        },
        "name": "Simon",
        "date": "18-10-2021 12:33:12"
    },
    {
        "client": {
            "id": "34567"
        },
        "name": "Bob",
        "date": "18-10-2021 12:34:15"
    },
    {
        "client": {
            "id": "34567"
        },
        "name": "Bob",
        "date": "18-10-2021 12:34:34"
    }
]

summarised_ids = []
summarised_messages = []

for message in data[::-1]:
    if message['client']['id'] not in summarised_ids:
        summarised_ids.append(message['client']['id'])

for message in data[::-1]:
    if message['client']['id'] in summarised_ids:
        summarised_messages.append(message)
        summarised_ids.remove(message['client']['id'])

for message in summarised_messages:
    print(message)

{'client': {'id': '34567'}, 'name': 'Bob', 'date': '18-10-2021 12:34:34'}
{'client': {'id': '23456'}, 'name': 'Simon', 'date': '18-10-2021 12:33:12'}
{'client': {'id': '12345'}, 'name': 'John', 'date': '18-10-2021 12:31:25'}

Currently it's very verbose and I don't know how I can better reduce these steps:

  1. Reverse iterate through the original list and add the ID to new summarised_ids list if it's not there

  2. Reverse iterate through the original list again and append the message if the ID is in the summarised_ids list

  3. Ignore message if the ID is already there

  4. Print the summarised_messages list

Upvotes: 0

Views: 82

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61910

Try using a dictionary to deduplicate the list:

result = list({ d["client"]["id"] : d for d in data}.values())
for row in result:
    print(row)

Output

{'client': {'id': '12345'}, 'name': 'John', 'date': '18-10-2021 12:31:25'}
{'client': {'id': '23456'}, 'name': 'Simon', 'date': '18-10-2021 12:33:12'}
{'client': {'id': '34567'}, 'name': 'Bob', 'date': '18-10-2021 12:34:34'}

To match your exact output, you could do:

result = list({d["client"]["id"]: d for d in data}.values())[::-1]

Upvotes: 1

Related Questions