Reputation: 13
I'd like to de-dupe the below list, but also keep a list of the duplicates to display on the following screen. This is pulled from a CSV file, so it'd be great to display the user what's been added and what hasn't been added "Dupes" etc.
[
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'], < -- Dupe
['Sally', 'Surname', '[email protected]'], < -- Dupe
['Chris', 'Lastnam', '[email protected]'],
['Larry', 'Seconds', '[email protected]'],
['Barry', 'Barrins', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
['Glenn', 'Melting', '[email protected]'], < -- Dupe
]
The ultimate result would be to generate two lists, one of the nice de-duped results and the other a list of the duplicates.
Unique:
[
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Chris', 'Lastnam', '[email protected]'],
['Larry', 'Seconds', '[email protected]'],
['Barry', 'Barrins', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
]
Dupes:
[
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
]
Upvotes: 0
Views: 134
Reputation: 1424
Try this one. This is easiest way.
name_list = [
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Chris', 'Lastnam', '[email protected]'],
['Larry', 'Seconds', '[email protected]'],
['Barry', 'Barrins', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
]
sorted_name_list = sorted(name_list[1:])
last_record = False
Unique = []
Dupes = []
for record in sorted_name_list:
if last_record != record:
Unique.append(record)
else:
Dupes.append(record)
last_record = record
print Unique
print Dupes
Upvotes: 1
Reputation: 32521
You can get the frequency with
from collections import Counter
t = Counter(tuple(x) for x in data[1:])
uniques = [list(k) for k, v in t.iteritems() if v == 1]
dupes = [list(k) * (v-1) for k, v in t.iteritems() if v > 1]
Upvotes: 0
Reputation: 15058
You can copy and paste this code to get a return dictionary of dupes and uniques:
a = [
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Sally', 'Surname', '[email protected]'],
['Chris', 'Lastnam', '[email protected]'],
['Larry', 'Seconds', '[email protected]'],
['Barry', 'Barrins', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
['Glenn', 'Melting', '[email protected]'],
]
result = {}
b = [tuple(x) for x in a[1:]]
all_uniques = set(b)
result['unique'] = [list(x) for x in list(all_uniques)]
# To show which ones have duplicates use Mr Es solution:
from collections import Counter
t = Counter(b)
dupes = []
for k, v in t.iteritems():
if v > 1:
dupes.append(list(k)*(v-1))
result['dupes'] = dupes
print(result)
Upvotes: 1