Reputation: 411
I am working on the following Dataframe:
print (df)
LN FN
0 Smith Jason
1 Smith Pat
2 Smith Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra
I would like to filter the duplicated names on ['LN'] and put a first character of a name from ['FN']. In this example, I would like to add 'J', 'P', and 'L' to each 'Smith' with the space on ['LN'].
Desired output would be:
print (df)
LN FN
0 Smith J Jason
1 Smith P Pat
2 Smith L Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra
My attempt:
My code below achieved the desired output but there should be a cleaner and more pandas-like way of achieving this.
df1 = df.loc[df.duplicated('LN', False)]
df2 = pd.DataFrame(df1.LN + ' '+ df1.FN.str.get(0))
df3 = pd.concat([df1,df2], axis=1)
df3 = df3[[0, 'FN']]
df3.columns = ['LN', 'FN']
df.update(df3)
Thank you for your help on this!
Upvotes: 2
Views: 210
Reputation: 210842
you can do it this way:
In [41]: df.loc[df.LN.duplicated(keep=False), 'LN'] += ' ' + df.FN.str[0]
In [42]: df
Out[42]:
LN FN
0 Smith J Jason
1 Smith P Pat
2 Smith L Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra
Upvotes: 4