Reputation: 21
I have a JSON file which is a list of dictionaries.
I want to remove the duplicate dicts based on the "book_id" key and keep just one of them with max value for the "id" key)
example:
book_list = [
{"id": 20, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 19, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 18, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 17, "user_id": 8, "device_id": 1, "book_id": 78},
{"id": 16, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 15, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 13, "user_id": 8, "device_id": 1, "book_id": 75}]
What I want is like that:
new_list = [
{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81},
{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78},
{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77},
{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}]
executing the below code with this example list works completely well, and I got the desired result. but running on my JSON file data got some problems and cannot group them correctly and it removes some of the duplicate values, but not all of them:
for key, value in groupby(book_list, key=lambda v: v["book_id"]):
newlist.append(max(list(value), key=lambda v: v["id"]))
I try also this code, it can remove all the duplicate data but it gets the one with minimum id (it actually overwrite on the last dictionary):
new_list = list({v['book_id']: v for v in book_list}.values())
I prefer lines of code that can be done in O(1) (if possible)
Upvotes: 0
Views: 75
Reputation:
Use dict.setdefault
to create a dict of dict of lists. Loop over the values of outer dict and find max key of inner dict:
temp = {}
for d in book_list:
temp.setdefault(d['book_id'], {}).setdefault(d['id'], []).append(d)
new_list = [v[max(v)] for v in temp.values()]
Output:
[[{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81}],
[{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78}],
[{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77}],
[{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}]]
Then again, looks like the list is already sorted, so we could reverse book_list
and read the list, reverse it back:
new_list = list({d['book_id']: d for d in book_list[::-1]}.values())[::-1]
Upvotes: 1
Reputation: 42133
If you sort the list by 'id' and then form a temporary dictionary based on the 'book_id', only the dictionary with the largest id will be kept as value for each duplicate 'book_id' key:
ids = {b['book_id']:b for b in sorted(book_list,key=lambda b:b['id'])}
book_list = list(ids.values())
print(*book_list,sep='\n')
{'id': 13, 'user_id': 8, 'device_id': 1, 'book_id': 75}
{'id': 16, 'user_id': 8, 'device_id': 1, 'book_id': 77}
{'id': 17, 'user_id': 8, 'device_id': 1, 'book_id': 78}
{'id': 20, 'user_id': 8, 'device_id': 1, 'book_id': 81}
Upvotes: 0
Reputation: 370
try it:
book_list = [
{"id": 20, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 19, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 18, "user_id": 8, "device_id": 1, "book_id": 81},
{"id": 17, "user_id": 8, "device_id": 1, "book_id": 78},
{"id": 16, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 15, "user_id": 8, "device_id": 1, "book_id": 77},
{"id": 13, "user_id": 8, "device_id": 1, "book_id": 75}
]
new_book_list = []
_book_id_list = set() # ids added here # suggested by "MYousefi"
for book in book_list:
if book['book_id'] not in _book_id_list:
_book_id_list.add(book['book_id']) # dont accept a book Twice
new_book_list.append(book)
print(f'{new_book_list = }')
thanks MYousefi
Upvotes: 2