Reputation: 1345
Take the below example. To replace one string in one particular column I have done this and it worked fine:
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1']},
columns = ['key', 'data1', 'data2'])
key data1 data2
0 A 0 A1
1 B 1 B1
2 C 2 C1
3 A 3 A1
4 B 4 B1
5 C 5 C1
df['data2']= df['data2'].str.strip().str.replace("A1","Bad")
key data1 data2
0 A 0 Bad
1 B 1 B1
2 C 2 C1
3 A 3 Bad
4 B 4 B1
5 C 5 C1
Q(1) How can we conditionally replace one string? Meaning that, in column data2
, I would like to replace A1
but only if "key==A" and "data1">1
. How can I do that?
Q(2) Can the conditional replacement be applied to multiple replacement (i.e., replacing A1 and A2
at the same time with "Bad" but only under similar conditions?
Upvotes: 1
Views: 3484
Reputation: 1345
If we want to extend the example above in the following way:
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1']},
columns = ['key', 'data1', 'data2'])
mask = (df['data1'] > 1)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace("A1",df['key'])
key data1 data2
0 A 0 A1
1 B 1 B1
2 C 2 NaN
3 A 3 NaN
4 B 4 NaN
5 C 5 NaN
I am very surprised by the answer I thought the content of data2 would be replaced by content of column "key" (at the condition data1>1). any idea?
Upvotes: 0
Reputation: 10359
You can use numpy
and a regex
-based replacement to cover A1, A2
and more. if we extend your data to include an example with A3
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C', 'A'],
'data1': range(7),
'data2': ['A1', 'B1', 'C1', 'A1', 'B1', 'C1', 'A3']},
columns=['key', 'data1', 'data2'])
df['data2'] = np.where((df['key'] == 'A') & (df['data1'] > 1),
df['data2'].str.replace(r'A\d+','Bad'),
df['data2'])
This returns:
key data1 data2
0 A 0 A1
1 B 1 B1
2 C 2 C1
3 A 3 Bad
4 B 4 B1
5 C 5 C1
6 A 6 Bad
Upvotes: 3
Reputation: 862661
I think need filter column in both sides with replace only for filtered rows:
mask = (df['key']=="A") & (df['data1'] > 1)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace("A1","Bad")
print (df)
key data1 data2
0 A 0 A1
1 B 1 B1
2 C 2 C1
3 A 3 Bad
4 B 4 B1
5 C 5 C1
If need multiple replace use replace
with dict
:
df = pd.DataFrame({'key': ['A', 'A', 'C', 'A', 'B', 'C'],
'data1': range(6),
'data2': ['A1', 'A2', 'C1', 'A1', 'B1', 'C1']},
columns = ['key', 'data1', 'data2'])
mask = (df['key']=="A") & (df['data1'] > 0)
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().replace({"A1":"Bad", "A2":'Bad1'})
Or use regex:
df.loc[mask, 'data2']= df.loc[mask, 'data2'].str.strip().str.replace(r'^A.*',"Bad")
print (df)
key data1 data2
0 A 0 A1
1 A 1 Bad1
2 C 2 C1
3 A 3 Bad
4 B 4 B1
5 C 5 C1
Upvotes: 3