AlexW
AlexW

Reputation: 2587

Python - Return a unique list of objects

I am trying to get a unique list of objects, I have some code that pulls data from an API and then puts that data into an object. I then put those objects in a list. however some of the objects are duplicates and I would like to know how to remove them?

sample list data:

[
Policy: 'SQL', 
SecondaryPolicy: 'ORACLE', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Fred', 
Mobile: '123', 

Policy: 'Comms', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bob', 
Mobile: '456', 

Policy: 'Infra', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bill', 
Mobile: '789', 

Policy: 'Comms', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bob', 
Mobile: '456', 
]

code (ive removed some of the object data and put in sample data, for this test im just trying to get freds result returned once)

objPolicyData = getUserData()

OnCallData = [] 
for UserItem in objPolicyData['users']:   
    UserData = User()     
    #get the user object from DB
    UserData.Name   = 'Fred'
    for OnCall in UserItem['on_call']:    
        UserPolicy = OnCall['escalation_policy'] 
        UserData.Policy          = 'SQL'
        UserData.SecondaryPolicy = 'ORACLE'
        OnCallData.append(UserData)

attempts: i tried this

clean_on_call_data = {User.Name for User in OnCallData}

but this only prints

set(['Fred'])

where are the other fields in the objects, and how would i iterate it?

EDIT: this is my class, is the cmp correct? how do i remove the duplicate?

class User(object):
    __attrs = ['Policy','SecondaryPolicy','Name']

    def __init__(self, **kwargs):
        for attr in self.__attrs:
            setattr(self, attr, kwargs.get(attr, None))

    def __repr__(self):
        return ', '.join(
            ['%s: %r' % (attr, getattr(self, attr)) for attr in self.__attrs])  

    def __cmp__(self):     
        if self.Name != other.Name:  

Upvotes: 2

Views: 207

Answers (3)

Greg Hilston
Greg Hilston

Reputation: 2424

For Python 2.x

I think you'll want to implement __cmp__ for your class that stores the API data.

For Python 3.x

I think you'll want to implement __eq__ and __hash__ for your class that stores the API data.

Regardless of which version of Python, you can use the comparator / eq method to check for duplicates in your list. This can be done by utilizing set(list), if you defined __eq__. As a set is a list of unique objects.

Upvotes: 2

noteness
noteness

Reputation: 2520

You could subclass the User class and implement __eq__ and __hash__ method, then just add those to a set, like this:

class UserUnique(User):
    def __hash__(self):
        return hash(self.Name)
    def __eq__(self, o):
        return self.Name == o.Name

Then you can do like this:

OnCallData = set()
for UserItem in objPolicyData['users']:   
    UserData = UserUnique()     
    UserData.Name = 'Fred'
    for OnCall in UserItem['on_call']:    
        UserPolicy = OnCall['escalation_policy'] 
        UserData.Policy = 'SQL'
        UserData.SecondaryPolicy = 'ORACLE'
        OnCallData.add(UserData)

Upvotes: 0

Luis
Luis

Reputation: 3497

How about using dictionaries and then a pandas.DataFrame?

Something like:

d1 = {
'Policy': 'SQL', 
'SecondaryPolicy': 'ORACLE', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Fred', 
'Mobile': '123', 
}
d2 = {
'Policy': 'Comms', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bob', 
'Mobile': '456', 
}
d3 = {
'Policy': 'Infra', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bill', 
'Mobile': '789', 
}
d4 = {
'Policy': 'Comms', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bob', 
'Mobile': '456', 
}


data = pd.DataFrame([d1,d2,d3,d4])

data[ data.Name=='Fred' ]

Which outs:

enter image description here

Upvotes: 0

Related Questions