Reputation: 1185
dfF:
Sample AlmostFinal
1 KOPLA234
1 KOPLA234
2 RWPLB253
3 MMPLA415
3 MMPLA415
I need to replace KOPL
and RWP
and MM
to KOLPOL and last char a/b should stay. So result shoud be:
Sample AlmostFinal Final
1 KOPLA234 KOLPOLA234
1 KOPLA234 KOLPOLA234
2 RWPLB253 KOLPOLB253
3 MMPLA415 KOLPOLA415
3 MMPLA415 KOLPOLA415
I tried to do it by replace:
dfF['Final'] = (dfF['AlmostFinal'].replace({'KOPL':'KOLPOL'}, regex = True))
dfF['Final'] = (dfF['AlmostFinal'].replace({'RWP':'KOLPOL'}, regex = True))
dfF['Final'] = (dfF['AlmostFinal'].replace({'MMPL':'KOLPOL'}, regex = True))
And: If i comment 2th and 3th line replaces for KOPL works.
When I comment 1st and 3th replace for RWP works.
But when I uncomment all and try to run all 3 lines works only last. Why? In another script I have a similar code and it changes whole while and whole lines works.
Upvotes: 1
Views: 198
Reputation: 403198
You can use a single replace
call with regex=True
:
df['Final'] = df['AlmostFinal'].replace(
[r'KOPL', r'RWP.*?(?=A|B)', r'MM.*(?=A|B)'], 'KOLPOL', regex=True)
df
Sample AlmostFinal Final
0 1 KOPLA234 KOLPOLA234
1 1 KOPLA234 KOLPOLA234
2 2 RWPLB253 KOLPOLB253
3 3 MMPLA415 KOLPOLA415
4 3 MMPLA415 KOLPOLA415
We want to be able to handle varying number of characters between the substrings and the last character, so regex with lookahead will be useful here.
Further generalisation is possible. Just define your substrings, then insert a lookahead via list comp.
pat = ['KOPL', 'RWP', 'MM']
df['Final'] = df['AlmostFinal'].replace(
[rf'{p}.*(?=A|B)' for p in pat], 'KOLPOL', regex=True) # need python3.6+
df
Sample AlmostFinal Final
0 1 KOPLA234 KOLPOLA234
1 1 KOPLA234 KOLPOLA234
2 2 RWPLB253 KOLPOLB253
3 3 MMPLA415 KOLPOLA415
4 3 MMPLA415 KOLPOLA415
If you want to replace specific substrings, the solution is a little more simple.
pat = ['KOPL', 'RWPL', 'MMPL']
df['AlmostFinal'].replace(pat, 'KOLPOL', regex=True)
0 KOLPOLA234
1 KOLPOLA234
2 KOLPOLB253
3 KOLPOLA415
4 KOLPOLA415
Name: AlmostFinal, dtype: object
No other modifications required. For more general replacements, see above.
Upvotes: 1
Reputation: 57115
You should execute one assignment, not three. Otherwise, each next assignment overwrites the results of the previous assignment.
dfF['Final'] = dfF['AlmostFinal']\
.replace({'KOP|RWP|MMP': 'KOLPO'}, regex = True)
Upvotes: 1
Reputation: 42602
And: If i comment 2th and 3th line replaces for KOPL works. When I comment 1st and 3th replace for RWP works. But when I uncomment all and try to run all 3 lines works only last. Why?
Because replace creates a new dataframe, and since you're always doing the replacement on the one original dataframe, each replace throws away the result of the previous one.
Either do all replacements simultaneously e.g. use a regex or I guess a single dict with multiple values (not sure why you'd use a dict for a single value here really:
{
'KOPL':'KOLPOL',
'RWP':'KOLPOL',
'MMP':'KOLPOL',
}
or perform each replace on the result of the previous one (either chain replace, or the second and third should work on df['Final']
).
Upvotes: 1