Reputation: 177
import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.
Edit:
If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom
data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}
Upvotes: 12
Views: 5075
Reputation: 862541
EDIT: For count duplicated per rows use:
df = pd.DataFrame(data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'],
'OtherName':['Tom', 'John', 'Bob','Steve'],
'Age':[20, 21, 19, 18]})
print (df)
Name OtherName Age
0 Tom Tom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
import inflect
p = inflect.engine()
#map by function for dynamic counter
f = lambda i: p.number_to_words(p.ordinal(i))
#columns filled by names
cols = ['Name','OtherName']
#reshaped to MultiIndex Series
s = df[cols].stack()
#counter per groups
count = s.groupby([s.index.get_level_values(0),s]).cumcount().add(1)
#mask for filter duplicates
mask = s.reset_index().duplicated(['level_0',0], keep=False).values
#filter only duplicates and map, reshape back and add to original data
df[cols] = count[mask].map(f).unstack().add(df[cols], fill_value='')
print (df)
Name OtherName Age
0 firstTom secondTom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
Use GroupBy.cumcount
with Series.map
, but only for duplicated values by Series.duplicated
:
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
nth = {
0: "First",
1: "Second",
2: "Third",
3: "Fourth"
}
mask = df.Name.duplicated(keep=False)
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().map(nth) + df.loc[mask, 'Name']
print (df)
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
Dynamic dictionary should be like:
import inflect
p = inflect.engine()
mask = df.Name.duplicated(keep=False)
f = lambda i: p.number_to_words(p.ordinal(i))
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().add(1).map(f) + df.loc[mask, 'Name']
print (df)
Name Age
0 firstTom 20
1 secondTom 21
2 Jack 19
3 Terry 18
Upvotes: 8
Reputation: 323226
We can do cumcount
df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
Name Age
0 Tom0 20
1 Tom1 21
2 Jack0 19
3 Terry0 18
Update
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')
df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
Name Age
0 1stTom 20
1 2ndTom 21
2 Jack 19
3 Terry 18
Update 2 for column
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
Name OtherName
0 1stTom 2ndTom
1 Jerry John
2 Jack Bob
3 Terry Steve
Upvotes: 10
Reputation: 294228
transform
nth = ['First', 'Second', 'Third', 'Fourth']
def prefix(d):
n = len(d)
if n > 1:
return d.radd([nth[i] for i in range(n)])
else:
return d
df.assign(Name=df.groupby('Name').Name.transform(prefix))
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
4 FirstSteve 17
5 SecondSteve 16
6 ThirdSteve 15
Upvotes: 6
Reputation: 75080
Just adding in to the existing solutions , you can use inflect
to create dynamic dictionary
import inflect
p = inflect.engine()
df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)
Name Age
0 Tom_1st 20
1 Tom_2nd 21
2 Jack_1st 19
3 Terry_1st 18
Upvotes: 12