Reputation: 407
I have two dataframes where I am trying to replace substring on level1 of multi index with another substring but this fails
For example I have a dataframe df
Index0 Index1 0 1 2
A BX .2 .3 .9
CX .34 .55 .54
D EX .34 .44 .32
FX .43. .88. .06
I am trying to replace the Index1 substring X by Y so that my result
looks like as follows
Index0 Index1 0 1 2
A BY .2 .3 .9
CY .34 .55 .54
D EY .34 .44 .32
FY .43. .88. .06
I am using the following function
df.replace('X','Y')
however i get the following error
AttributeError Traceback (most recent call last)
<ipython-input-56-fc7014a2d950> in <module>()
8
9
---> 10 df.replace('X','Y')
AttributeError: 'MultiIndex' object has no attribute 'replace'
Upvotes: 2
Views: 2623
Reputation: 323376
Or Try this
df.index=pd.MultiIndex.from_tuples([(x[0], x[1].replace('X', 'Y')) for x in df.index])
df
Out[304]:
0 1 2
a aY -0.696181 -1.929523 -1.903956
bY 0.071061 -0.594185 -2.005251
b cY -0.097761 0.093667 1.780550
dY 0.127887 1.534395 0.352351
Upvotes: 3
Reputation: 40918
@cᴏʟᴅsᴘᴇᴇᴅ improved on my answer so I will leave just a slower alternate here...
import numpy as np
df = pd.DataFrame(np.random.randn(4,3),
index=[list('aabb'), [n + 'X' for n in list('abcd')]])
Here's an alternate method using reset_index
. This would be applicable if you wanted to replace in more than one column. The trick is that you can't use replace
on the Index so you have to "bring it into" the DataFrame.
new = (df.reset_index()
.select_dtypes(include=['object'])
.apply(lambda col: col.str.replace('X', 'Y')))
df.index = pd.MultiIndex.from_tuples(new.values.tolist())
Upvotes: 3
Reputation: 403120
You're doing more than you need to.
df
0 1 2
Index0 Index1
A BX .2 .3 0.90
CX .34 .55 0.54
D EX .34 .44 0.32
FX .43. .88. 0.06
Use pd.MultiIndex.from_arrays
and you can do this in one step.
df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
df.index.levels[1].str.replace('X', 'Y')])
df
0 1 2
Index0 Index1
A BY .2 .3 0.90
CY .34 .55 0.54
D EY .34 .44 0.32
FY .43. .88. 0.06
Performance
%%timeit
new = (df.reset_index()
.select_dtypes(include=['object'])
.apply(lambda col: col.str.replace('X', 'Y')))
df.index = pd.MultiIndex.from_tuples(new.values.tolist())
10 loops, best of 3: 93.5 ms per loop
Nearly 100ms
for a tiny dataframe. Contrast with:
%%timeit
df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
df.index.levels[1].str.replace('X', 'Y')])
1000 loops, best of 3: 934 µs per loop
Upvotes: 2