fateme.ata
fateme.ata

Reputation: 21

Remove dictionaries with identical values for a specific key considering another value in the dictionary in Python

I have a JSON file which is a list of dictionaries.

I want to remove the duplicate dicts based on the "book_id" key and keep just one of them with max value for the "id" key)

example:

book_list = [
{"id": 20, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 19, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 18, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 17, "user_id": 8, "device_id": 1, "book_id": 78},
{"id": 16, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 15, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 13, "user_id": 8, "device_id": 1, "book_id": 75}]

What I want is like that:

new_list = [
{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81},
{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78}, 
{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77}, 
{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}]

executing the below code with this example list works completely well, and I got the desired result. but running on my JSON file data got some problems and cannot group them correctly and it removes some of the duplicate values, but not all of them:

for key, value in groupby(book_list, key=lambda v: v["book_id"]):
            newlist.append(max(list(value), key=lambda v: v["id"]))

I try also this code, it can remove all the duplicate data but it gets the one with minimum id (it actually overwrite on the last dictionary):

    new_list = list({v['book_id']: v for v in book_list}.values())

I prefer lines of code that can be done in O(1) (if possible)

Upvotes: 0

Views: 75

Answers (3)

user7864386
user7864386

Reputation:

Use dict.setdefault to create a dict of dict of lists. Loop over the values of outer dict and find max key of inner dict:

temp = {}
for d in book_list:
    temp.setdefault(d['book_id'], {}).setdefault(d['id'], []).append(d)
new_list = [v[max(v)] for v in temp.values()]

Output:

[[{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81}],
 [{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78}],
 [{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77}],
 [{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}]]

Then again, looks like the list is already sorted, so we could reverse book_list and read the list, reverse it back:

new_list = list({d['book_id']: d for d in book_list[::-1]}.values())[::-1]

Upvotes: 1

Alain T.
Alain T.

Reputation: 42133

If you sort the list by 'id' and then form a temporary dictionary based on the 'book_id', only the dictionary with the largest id will be kept as value for each duplicate 'book_id' key:

ids = {b['book_id']:b for b in sorted(book_list,key=lambda b:b['id'])}
book_list = list(ids.values())

print(*book_list,sep='\n')
{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}
{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77}
{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78}
{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81}

Upvotes: 0

Delta
Delta

Reputation: 370

try it:

book_list = [
    {"id": 20, "user_id": 8, "device_id": 1, "book_id": 81},
    {"id": 19, "user_id": 8, "device_id": 1, "book_id": 81},
    {"id": 18, "user_id": 8, "device_id": 1, "book_id": 81},
    {"id": 17, "user_id": 8, "device_id": 1, "book_id": 78},
    {"id": 16, "user_id": 8, "device_id": 1, "book_id": 77},
    {"id": 15, "user_id": 8, "device_id": 1, "book_id": 77},
    {"id": 13, "user_id": 8, "device_id": 1, "book_id": 75}
]
new_book_list = [] 
_book_id_list = set() # ids added here # suggested by "MYousefi"

for book in book_list:
    if book['book_id'] not in _book_id_list:
        _book_id_list.add(book['book_id']) # dont accept a book Twice
        new_book_list.append(book)

print(f'{new_book_list = }')

thanks MYousefi

Upvotes: 2

Related Questions