Reputation: 3
I have a dynamic list :
[{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'},
{'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'},
{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' },
{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]
I want to remove the duplicates based on zone_name and location. There are 3 values in zone_name. I want to remove the old one. I have sorted using the end_date. That is latest will come at top. Now i need to remove the duplicate value based on zone_name and location.
This is what i have tried:
final_zone = []
res_list = []
for i in sortedArray:
if i["location"] not in final_zone:
sch.append(i)
final_zone.append(i["location"])
What change i need to do to remove the duplicate based on zone_name and location.
That is in zone 1 left , there are 3 values, i need the latest one
Upvotes: 0
Views: 75
Reputation: 73450
For a general approach with an unsorted list:
from itertools import groupby
from operator import itemgetter
# sorting and grouping functions
f_sort = itemgetter("location", "zone_name", "end_date") # sort by descending
f_group = itemgetter("location", "zone_name") # group sorted by
result = [
next(g) for _, g in # only take latest of each group
groupby(sorted(array, key=f_sort, reverse=True), key=f_group)
]
Here is some documentation on the used utils (all of which are really handy in a lot of use cases):
Upvotes: 1
Reputation: 6296
The other answers work but I want to add a solution using Pandas
you can create a dataframe from your list of dictionaries:
import pandas as pd
d = [{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'}, {'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'},
{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' },
{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]
df = pd.DataFrame(d)
This is what df looks like:
dashboard end_date location zone_name
0 AG 2021-06-17 13:13:43 EC & pH Reading Zone 1 Left
1 AG 2021-06-17 12:40:06 Harvest Zone 2 Left
2 AG 2021-06-16 15:52:52 Harvest Zone 1 Left
3 AG 2021-06-16 15:45:51 Harvest Zone 1 Left
Sort of like a table in excel.
Now with one line, you can do exactly what you want:
df.sort_by("end_date").drop_duplicates(["location", "zone_name"], keep="last")
output:
dashboard end_date location zone_name
2 AG 2021-06-16 15:52:52 Harvest Zone 1 Left
1 AG 2021-06-17 12:40:06 Harvest Zone 2 Left
0 AG 2021-06-17 13:13:43 EC & pH Reading Zone 1 Left
Upvotes: 0
Reputation: 3536
clean_list=[]
for elem in lst:
# control if an element with the same zone name and location
# is yet present in the clean list
yet_present= len([el for el in clean_list
if el['zone_name']==elem['zone_name']
if el['location']==elem['location']])>0
if not yet_present:
clean_list.append(elem)
OUTPUT:
[{'dashboard': 'AG',
'end_date': '2021-06-17 13:13:43',
'location': 'EC & pH Reading',
'zone_name': 'Zone 1 Left'},
{'dashboard': 'AG',
'end_date': '2021-06-17 12:40:06',
'location': 'Harvest',
'zone_name': 'Zone 2 Left'},
{'dashboard': 'AG',
'end_date': '2021-06-16 15:52:52',
'location': 'Harvest',
'zone_name': 'Zone 1 Left'}]
Upvotes: 0
Reputation: 2894
You can just loop through the list and memorize the indices you want to keep.
keepers = {}
for i in range(len(sorted_array)):
keepers(sorted_array[i]['location'])=i ## Will be overwritten if the zone_name repeats
final_array = []
for i in keepers.values():
final_array.append(sorted_array[i])
As a bonus, you get a list of all zones in keepers.keys()
.
But your approach might actually also work. Just change sch.append(i)
to res_list.append(i)
and change the order of the iterable (for i in sorted_array[::-1]
), so the last and not the first one gets kept.
Upvotes: 0
Reputation: 18406
Create a variable result
, and for each dictionary item in the data
list, check if its already there in the result, if yes don't append, else append
it to the result list.
result = []
for item in data:
if item['zone_name'] in (x['zone_name'] for x in result):
continue
result.append(item)
OUTPUT:
[{'dashboard': 'AG',
'end_date': '2021-06-17 13:13:43',
'location': 'EC & pH Reading',
'zone_name': 'Zone 1 Left'},
{'dashboard': 'AG',
'end_date': '2021-06-17 12:40:06',
'location': 'Harvest',
'zone_name': 'Zone 2 Left'}]
Upvotes: 0