Reputation: 13
I'm currently trying to conduct an analysis where people might be doing something to avoid the system. So, I created a new field inside my dataFrame where I appended the Issue Date and the Name of the potential offender. What I want is: if any of the rows have the same Audit ID, say yes, if not, NaN.
So for example, I have:
Offender Name Issue Date Audit ID
Joe 12/02/2020 Joe-12/02/20
Nic 20/02/2020 Nic-20/02/20
Mat 01/02/2020 Mat-01/02/20
Joe 12/02/2020 Joe-12/02/20
And I want something like:
Offender Name Issue Date Audit ID Matches
Joe 12/02/2020 Joe-12/02/20 Yes
Nic 20/02/2020 Nic-20/02/20 No
Mat 01/02/2020 Mat-01/02/20 No
Joe 12/02/2020 Joe-12/02/20 Yes
I'd appreciate any insights anyone can give me
Upvotes: 0
Views: 51
Reputation: 5036
You can mark duplicates with 'Yes' and 'No'
df['Matches'] = df.duplicated('Audit ID', keep=False).map({True: 'Yes',False: 'No'})
df
Out:
Offender Name Issue Date Audit ID Matches
0 Joe 12/02/2020 Joe-12/02/20 Yes
1 Nic 20/02/2020 Nic-20/02/20 No
2 Mat 01/02/2020 Mat-01/02/20 No
3 Joe 12/02/2020 Joe-12/02/20 Yes
The column Audit ID
is redundant. You have the same informations in your dataframe already
df['Matches'] = df.duplicated(['Offender Name','Issue Date'], keep=False).map({True: 'Yes',False: 'No'})
df
Out:
Offender Name Issue Date Audit ID Matches
0 Joe 12/02/2020 Joe-12/02/20 Yes
1 Nic 20/02/2020 Nic-20/02/20 No
2 Mat 01/02/2020 Mat-01/02/20 No
3 Joe 12/02/2020 Joe-12/02/20 Yes
Upvotes: 1