Reputation: 335
I stumbled upon a weird and inconsistent behavior for Pandas replace
function when using it to swap two values of a column. When using it to swap integers in a column we have
df = pd.DataFrame({'A': [0, 1]})
df.A.replace({0: 1, 1: 0})
This yields the result:
df
A
1
0
However, when using the same commands for string values
df = pd.DataFrame({'B': ['a', 'b']})
df.B.replace({'a': 'b', 'b': 'a'})
We get
df
B
'a'
'a'
Can anyone explain me this difference in behavior, or point me to a page in the docs that deals with inconsistencies when using integers and strings in pandas?
Upvotes: 8
Views: 651
Reputation: 402603
Yup, this is definitely a bug, so I've opened a new issue - GH20656.
It looks like pandas applies the replacements successively. It makes first replacement, causing "a" to be replaced with "b", and then the second, causing both "b"s to be replaced by "a".
In summary, what you see is equivalent to
df.B.replace('a', 'b').replace('b', 'a')
0 a
1 a
Name: B, dtype: object
Which is definitely not what should be happening.
There is a workaround using str.replace
with a lambda
callback.
m = {'a': 'b', 'b': 'a'}
df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()])
0 b
1 a
Name: B, dtype: object
Upvotes: 5