Pie
Pie

Reputation: 937

Python Check For Duplicate Values In A Object Array

I have a array which stores a object. I am trying to see if there are duplicate values in this object array, but only on one of the objects parameters (hexdigest).

How can I check for duplicates and record the entire object of duplicates I find?

# class to store hashes
class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest

# array to store class data
hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))

# sort array
sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))

# check for duplicate hexdigest's
newlist = []
duplist = []
for obj in sorted_array:
    for jbo in newlist:
        if obj.hexdigest not in jbo:
            newlist.append(obj)
        else:
            duplist.append(obj) 

Upvotes: 0

Views: 572

Answers (2)

Chris
Chris

Reputation: 36601

Well, newlist is empty, so the inner for loop never runs, so nothing gets appended to newlist or duplist.

You may wish to group by the hexdigest attribute using itertools.groupby and a dictionary comprehension.

from operator import attrgetter
from itertools import groupby

class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest

hash_array = []
hash_array.append(customclass(value=299, hexdigest='927'))
hash_array.append(customclass(value=207, hexdigest='92b'))
hash_array.append(customclass(value=113, hexdigest='951'))
hash_array.append(customclass(value=187, hexdigest='951'))
hash_array.append(customclass(value=205, hexdigest='998'))

sorted_array = sorted(hash_array, key=attrgetter('hexdigest'))
# [<__main__.customclass object at 0x7f488d1a2a20>, 
#  <__main__.customclass object at 0x7f488d1a29b0>, 
#  <__main__.customclass object at 0x7f488d1a2b00>, 
#  <__main__.customclass object at 0x7f488d1a2b70>, 
#  <__main__.customclass object at 0x7f488d1a2c18>]

groups = groupby(sorted_array, key=attrgetter('hexdigest'))

{k: list(v) for k, v in groups}
# {'927': [<__main__.customclass object at 0x7f488d1a2a20>], 
#  '92b': [<__main__.customclass object at 0x7f488d1a29b0>], 
#  '951': [<__main__.customclass object at 0x7f488d1a2b00>,  
#          <__main__.customclass object at 0x7f488d1a2b70>], 
#  '998': [<__main__.customclass object at 0x7f488d1a2c18>]}

From there it's relatively easy to retrieve the unique and duplicate values.

It may be easier to visualize what's going on if you provide a more useful definition for __repr__.

class customclass: 
    def __init__(self, value, hexdigest): 
        self.value = value 
        self.hexdigest = hexdigest
    def __repr__(self):
        return f"<customclass value: {self.value}, hexdigest: {self.hexdigest}>"

Doing so, hash_array prints in the interactive interpreter as follows, with the exception of he newlines I added for sanity's sake.

[<customclass value: 299, hexdigest: 927>, 
 <customclass value: 207, hexdigest: 92b>, 
 <customclass value: 113, hexdigest: 951>, 
 <customclass value: 187, hexdigest: 951>, 
 <customclass value: 205, hexdigest: 998>]

Upvotes: 0

Madhan Nadar
Madhan Nadar

Reputation: 51

hex_list = []
duplist = []
for obj in sorted_array:
    if(obj.hexdigest in hex_list):
        duplist.append(obj)
    else:
        hex_list.append(obj.hexdigest)        

use this above block of code instead of the below one which you have implemented to find the list of duplicate object

newlist = []
duplist = []
for obj in sorted_array:
    for jbo in newlist:
        if obj.hexdigest not in jbo:
            newlist.append(obj)
        else:
            duplist.append(obj) 

Upvotes: 1

Related Questions