karthik
karthik

Reputation: 3

Remove Duplicates in list in python

I have a dynamic list :

[{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' }, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]

I want to remove the duplicates based on zone_name and location. There are 3 values in zone_name. I want to remove the old one. I have sorted using the end_date. That is latest will come at top. Now i need to remove the duplicate value based on zone_name and location.

This is what i have tried:

final_zone = []
res_list = []
for i in sortedArray:
     if i["location"] not in final_zone:
          sch.append(i)
          final_zone.append(i["location"])

What change i need to do to remove the duplicate based on zone_name and location.

That is in zone 1 left , there are 3 values, i need the latest one

Upvotes: 0

Views: 75

Answers (5)

user2390182
user2390182

Reputation: 73450

For a general approach with an unsorted list:

from itertools import groupby
from operator import itemgetter

# sorting and grouping functions
f_sort = itemgetter("location", "zone_name", "end_date")  # sort by descending
f_group = itemgetter("location", "zone_name")  # group sorted by

result = [
    next(g) for _, g in  # only take latest of each group
    groupby(sorted(array, key=f_sort, reverse=True), key=f_group)
]

Here is some documentation on the used utils (all of which are really handy in a lot of use cases):

Upvotes: 1

Tom McLean
Tom McLean

Reputation: 6296

The other answers work but I want to add a solution using Pandas

you can create a dataframe from your list of dictionaries:

import pandas as pd
d = [{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'}, {'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' }, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]
df = pd.DataFrame(d)

This is what df looks like:

dashboard             end_date         location    zone_name
0        AG  2021-06-17 13:13:43  EC & pH Reading  Zone 1 Left
1        AG  2021-06-17 12:40:06          Harvest  Zone 2 Left
2        AG  2021-06-16 15:52:52          Harvest  Zone 1 Left
3        AG  2021-06-16 15:45:51          Harvest  Zone 1 Left

Sort of like a table in excel.

Now with one line, you can do exactly what you want:

df.sort_by("end_date").drop_duplicates(["location", "zone_name"], keep="last")

output:

  dashboard             end_date         location    zone_name
2        AG  2021-06-16 15:52:52          Harvest  Zone 1 Left
1        AG  2021-06-17 12:40:06          Harvest  Zone 2 Left
0        AG  2021-06-17 13:13:43  EC & pH Reading  Zone 1 Left

Upvotes: 0

clean_list=[]

for elem in lst:
    # control if an element with the same zone name and location
    # is yet present in the clean list
    yet_present= len([el for el in clean_list
                if el['zone_name']==elem['zone_name']
                if el['location']==elem['location']])>0
    if not yet_present:
        clean_list.append(elem)

OUTPUT:

[{'dashboard': 'AG',
  'end_date': '2021-06-17 13:13:43',
  'location': 'EC & pH Reading',
  'zone_name': 'Zone 1 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-17 12:40:06',
  'location': 'Harvest',
  'zone_name': 'Zone 2 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-16 15:52:52',
  'location': 'Harvest',
  'zone_name': 'Zone 1 Left'}]

Upvotes: 0

Martin Wettstein
Martin Wettstein

Reputation: 2894

You can just loop through the list and memorize the indices you want to keep.

keepers = {}
for i in range(len(sorted_array)):
    keepers(sorted_array[i]['location'])=i ## Will be overwritten if the zone_name repeats

final_array = []
for i in keepers.values():
    final_array.append(sorted_array[i])

As a bonus, you get a list of all zones in keepers.keys().

But your approach might actually also work. Just change sch.append(i) to res_list.append(i) and change the order of the iterable (for i in sorted_array[::-1]), so the last and not the first one gets kept.

Upvotes: 0

ThePyGuy
ThePyGuy

Reputation: 18406

Create a variable result, and for each dictionary item in the data list, check if its already there in the result, if yes don't append, else append it to the result list.

result = []
for item in data:
    if item['zone_name'] in (x['zone_name'] for x in result):
        continue
    result.append(item)

OUTPUT:

[{'dashboard': 'AG',
  'end_date': '2021-06-17 13:13:43',
  'location': 'EC & pH Reading',
  'zone_name': 'Zone 1 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-17 12:40:06',
  'location': 'Harvest',
  'zone_name': 'Zone 2 Left'}]

Upvotes: 0

Related Questions