Reputation: 613
Given a dataframe like:
A B C
1 a yes
2 b yes
3 a no
I would like to change the dataframe to:
A B C
1 a yes
2 b no
3 a no
which means that if column B has the value 'b', I want to change the column C to 'no'. Which can be represented by df[df['B']=='b']['C'].str.replace('yes','no')
. But use this will not change dataframe df
itself. Even I tried df[df['B']=='b']['C'] = df[df['B']=='b']['C'].str.replace('yes','no')
it didn't work. I am wondering how to solve this problem.
Upvotes: 3
Views: 3045
Reputation: 862581
Solutions with set values by mask
:
df.loc[df.B == 'b', 'C'] = 'no'
print (df)
A B C
0 1 a yes
1 2 b no
2 3 a no
df['C'] = df['C'].mask(df.B == 'b','no')
print (df)
A B C
0 1 a yes
1 2 b no
2 3 a no
Solutions with replace only yes
string:
df.loc[df.B == 'b', 'C'] = df['C'].replace('yes', 'no')
print (df)
A B C
0 1 a yes
1 2 b no
2 3 a no
df['C'] = df['C'].mask(df.B == 'b', df['C'].replace('yes', 'no'))
print (df)
A B C
0 1 a yes
1 2 b no
2 3 a no
Difference better seen in changed df
:
print (df)
A B C
0 1 a yes
1 2 b yes
2 3 b another
3 4 a no
df['C_set'] = df['C'].mask(df.B == 'b','no')
df['C_replace'] = df['C'].mask(df.B == 'b', df['C'].replace('yes', 'no'))
print (df)
A B C C_set C_replace
0 1 a yes yes yes
1 2 b yes no no
2 3 b another no another
3 4 a no no no
EDIT:
In your solution is necessary only add loc
:
df.loc[df['B']=='b', 'C'] = df.loc[df['B']=='b', 'C'].str.replace('yes','no')
print (df)
A B C
0 1 a yes
1 2 b no
2 3 b another
3 4 a no
EDIT1:
I was really curious what method is fastest:
#[40000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
In [37]: %timeit df.loc[df['B']=='b', 'C'] = df['C'].str.replace('yes','no')
10 loops, best of 3: 79.5 ms per loop
In [38]: %timeit df.loc[df['B']=='b', 'C'] = df.loc[df['B']=='b','C'].str.replace('yes','no')
10 loops, best of 3: 48.4 ms per loop
In [39]: %timeit df.loc[df['B']=='b', 'C'] = df.loc[df['B']=='b', 'C'].replace('yes','no')
100 loops, best of 3: 14.1 ms per loop
In [40]: %timeit df['C'] = df['C'].mask(df.B == 'b', df['C'].replace('yes', 'no'))
100 loops, best of 3: 10.1 ms per loop
# piRSquared solution with replace
In [53]: %timeit df.C = np.where(df.B.values == 'b', df.C.replace('yes', 'no'), df.C.values)
100 loops, best of 3: 4.74 ms per loop
EDIT1:
Better is change condition - add df.C == 'yes'
or df.C.values == 'yes'
if need fastest solution:
df.loc[(df.B == 'b') & (df.C == 'yes'), 'C'] = 'no'
df.C = np.where((df.B.values == 'b') & (df.C.values == 'yes'), 'no', df.C.values)
Upvotes: 5
Reputation: 294228
df.C = np.where(df.B == 'b', 'no', df.C)
Or
df.C = np.where(df.B.values == 'b', 'no', df.C.values)
df.C = df.C.mask(df.B == 'b', 'no')
All change df
in place and yield
A B C
0 1 a yes
1 2 b no
2 3 a no
Upvotes: 4