Archi
Archi

Reputation: 79

loop through pandas dataframe to delete duplicates

eg I have following df:

**names**           **score**       WhatIWant
Jones, Tom, Eddy       119.        Jones, Tom, Eddy
Nick, Tim, Bob         222.        Nick, Tim, Bob
Jones, Eddy, Luke      221.            Luke
Timmy, Jones, Sam        112.      Timmy, Sam

I need to loop through this dataframe and delete all duplicate names in column 'names', i.e. remove those names which have appeared in earlier rows.

For example in row 1 and row 3 Jones and Eddy is repeated, I want them to be removed from row 3.

Upvotes: 1

Views: 287

Answers (1)

Yadnesh Salvi
Yadnesh Salvi

Reputation: 195

import pandas as pd
df = pd.DataFrame()
df['names'] = ['Jones, Tom, Eddy', 'Nick, Tim, Bob', 'Jones, Eddy, Luke', 'Timmy, Jones, Sam']
df['scores'] = [119, 222, 221, 112]

names_till_now = []
def get_unique_names(names_till_now, names_string):
    names_list = names_string.split(",")
    names_list = [name.strip() for name in names_list]
    names_unique = [i for i in names_list if i not in names_till_now]
    names_till_now += names_list
    return ', '.join(name for name in names_unique)

df['names'] = df.apply(lambda x: get_unique_names(names_till_now, x['names']),axis=1)
df

And the output is as follows


    names                     scores
0   Jones, Tom, Eddy          119
1   Nick, Tim, Bob            222
2   Luke                      221
3   Timmy, Sam                112

Upvotes: 1

Related Questions