Reputation: 937
I have a array which stores a object. I am trying to see if there are duplicate values in this object array, but only on one of the objects parameters (hexdigest).
How can I check for duplicates and record the entire object of duplicates I find?
# class to store hashes
class customclass:
def __init__(self, value, hexdigest):
self.value = value
self.hexdigest = hexdigest
# array to store class data
hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))
# sort array
sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))
# check for duplicate hexdigest's
newlist = []
duplist = []
for obj in sorted_array:
for jbo in newlist:
if obj.hexdigest not in jbo:
newlist.append(obj)
else:
duplist.append(obj)
Upvotes: 0
Views: 572
Reputation: 36601
Well, newlist
is empty, so the inner for loop never runs, so nothing gets appended to newlist
or duplist
.
You may wish to group by the hexdigest
attribute using itertools.groupby
and a dictionary comprehension.
from operator import attrgetter
from itertools import groupby
class customclass:
def __init__(self, value, hexdigest):
self.value = value
self.hexdigest = hexdigest
hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))
sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))
# [<__main__.customclass object at 0x7f488d1a2a20>,
# <__main__.customclass object at 0x7f488d1a29b0>,
# <__main__.customclass object at 0x7f488d1a2b00>,
# <__main__.customclass object at 0x7f488d1a2b70>,
# <__main__.customclass object at 0x7f488d1a2c18>]
groups = groupby(sorted_array, key=attrgetter('hexdigest'))
{k: list(v) for k, v in groups}
# {'927': [<__main__.customclass object at 0x7f488d1a2a20>],
# '92b': [<__main__.customclass object at 0x7f488d1a29b0>],
# '951': [<__main__.customclass object at 0x7f488d1a2b00>,
# <__main__.customclass object at 0x7f488d1a2b70>],
# '998': [<__main__.customclass object at 0x7f488d1a2c18>]}
From there it's relatively easy to retrieve the unique and duplicate values.
It may be easier to visualize what's going on if you provide a more useful definition for __repr__
.
class customclass:
def __init__(self, value, hexdigest):
self.value = value
self.hexdigest = hexdigest
def __repr__(self):
return f"<customclass value: {self.value}, hexdigest: {self.hexdigest}>"
Doing so, hash_array
prints in the interactive interpreter as follows, with the exception of he newlines I added for sanity's sake.
[<customclass value: 299, hexdigest: 927>,
<customclass value: 207, hexdigest: 92b>,
<customclass value: 113, hexdigest: 951>,
<customclass value: 187, hexdigest: 951>,
<customclass value: 205, hexdigest: 998>]
Upvotes: 0
Reputation: 51
hex_list = []
duplist = []
for obj in sorted_array:
if(obj.hexdigest in hex_list):
duplist.append(obj)
else:
hex_list.append(obj.hexdigest)
use this above block of code instead of the below one which you have implemented to find the list of duplicate object
newlist = []
duplist = []
for obj in sorted_array:
for jbo in newlist:
if obj.hexdigest not in jbo:
newlist.append(obj)
else:
duplist.append(obj)
Upvotes: 1