Frank State
Frank State

Reputation: 13

Django - De-dupe a List and Keep the Dupes

I'd like to de-dupe the below list, but also keep a list of the duplicates to display on the following screen. This is pulled from a CSV file, so it'd be great to display the user what's been added and what hasn't been added "Dupes" etc.

[
  ['first_name', 'last_name', 'email'],
  ['Danny', 'Lastnme', '[email protected]'],
  ['Sally', 'Surname', '[email protected]'],
  ['Sally', 'Surname', '[email protected]'],  < -- Dupe
  ['Sally', 'Surname', '[email protected]'],  < -- Dupe
  ['Chris', 'Lastnam', '[email protected]'],
  ['Larry', 'Seconds', '[email protected]'],
  ['Barry', 'Barrins', '[email protected]'],
  ['Glenn', 'Melting', '[email protected]'],
  ['Glenn', 'Melting', '[email protected]'],  < -- Dupe
]

The ultimate result would be to generate two lists, one of the nice de-duped results and the other a list of the duplicates.

Unique:

[
  ['first_name', 'last_name', 'email'],
  ['Danny', 'Lastnme', '[email protected]'],
  ['Sally', 'Surname', '[email protected]'],
  ['Chris', 'Lastnam', '[email protected]'],
  ['Larry', 'Seconds', '[email protected]'],
  ['Barry', 'Barrins', '[email protected]'],
  ['Glenn', 'Melting', '[email protected]'],
]

Dupes:

[
  ['Sally', 'Surname', '[email protected]'],
  ['Sally', 'Surname', '[email protected]'],
  ['Glenn', 'Melting', '[email protected]'],
]

Upvotes: 0

Views: 134

Answers (3)

Parth Gajjar
Parth Gajjar

Reputation: 1424

Try this one. This is easiest way.

name_list = [
    ['first_name', 'last_name', 'email'],
    ['Danny', 'Lastnme', '[email protected]'],
    ['Sally', 'Surname', '[email protected]'],
    ['Sally', 'Surname', '[email protected]'],  
    ['Sally', 'Surname', '[email protected]'], 
    ['Chris', 'Lastnam', '[email protected]'],
    ['Larry', 'Seconds', '[email protected]'],
    ['Barry', 'Barrins', '[email protected]'],
    ['Glenn', 'Melting', '[email protected]'],
    ['Glenn', 'Melting', '[email protected]'],
]
sorted_name_list = sorted(name_list[1:])
last_record  = False
Unique = []
Dupes = []
for record in sorted_name_list:
    if last_record != record:
        Unique.append(record)
    else:
        Dupes.append(record)
        last_record = record
print Unique
print Dupes

Upvotes: 1

YXD
YXD

Reputation: 32521

You can get the frequency with

from collections import Counter

t = Counter(tuple(x) for x in data[1:])

uniques = [list(k) for k, v in t.iteritems() if v == 1]
dupes = [list(k) * (v-1) for k, v in t.iteritems() if v > 1]

Upvotes: 0

Ewan
Ewan

Reputation: 15058

You can copy and paste this code to get a return dictionary of dupes and uniques:

a = [
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],  
['Sally', 'Surname', '[email protected]'], 
['Chris', 'Lastnam', '[email protected]'],
['Larry', 'Seconds', '[email protected]'],
['Barry', 'Barrins', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
]

result = {}

b = [tuple(x) for x in a[1:]]
all_uniques = set(b)
result['unique'] = [list(x) for x in list(all_uniques)]

# To show which ones have duplicates use Mr Es solution:

from collections import Counter

t = Counter(b)
dupes = []

for k, v in t.iteritems():
    if v > 1:
        dupes.append(list(k)*(v-1))

result['dupes'] = dupes

print(result)

Upvotes: 1

Related Questions