Reputation: 79
eg I have following df:
**names** **score** WhatIWant
Jones, Tom, Eddy 119. Jones, Tom, Eddy
Nick, Tim, Bob 222. Nick, Tim, Bob
Jones, Eddy, Luke 221. Luke
Timmy, Jones, Sam 112. Timmy, Sam
I need to loop through this dataframe and delete all duplicate names in column 'names', i.e. remove those names which have appeared in earlier rows.
For example in row 1 and row 3 Jones and Eddy is repeated, I want them to be removed from row 3.
Upvotes: 1
Views: 287
Reputation: 195
import pandas as pd
df = pd.DataFrame()
df['names'] = ['Jones, Tom, Eddy', 'Nick, Tim, Bob', 'Jones, Eddy, Luke', 'Timmy, Jones, Sam']
df['scores'] = [119, 222, 221, 112]
names_till_now = []
def get_unique_names(names_till_now, names_string):
names_list = names_string.split(",")
names_list = [name.strip() for name in names_list]
names_unique = [i for i in names_list if i not in names_till_now]
names_till_now += names_list
return ', '.join(name for name in names_unique)
df['names'] = df.apply(lambda x: get_unique_names(names_till_now, x['names']),axis=1)
df
And the output is as follows
names scores
0 Jones, Tom, Eddy 119
1 Nick, Tim, Bob 222
2 Luke 221
3 Timmy, Sam 112
Upvotes: 1