abhi_phoenix
abhi_phoenix

Reputation: 407

Replace operation on a dataframe MultiIndex

I have two dataframes where I am trying to replace substring on level1 of multi index with another substring but this fails

For example I have a dataframe df

Index0   Index1    0     1     2
A        BX       .2    .3    .9      
         CX       .34   .55   .54           

D        EX       .34   .44   .32
         FX       .43.  .88.  .06

I am trying to replace the Index1 substring X by Y so that my result
looks like as follows

Index0   Index1    0     1     2
A        BY       .2    .3    .9      
         CY       .34   .55   .54           

D        EY       .34   .44   .32
         FY       .43.  .88.  .06

I am using the following function

df.replace('X','Y')

however i get the following error

AttributeError                   Traceback (most recent   call last)
<ipython-input-56-fc7014a2d950> in <module>()
  8 
  9 
---> 10 df.replace('X','Y')

AttributeError: 'MultiIndex' object has no attribute 'replace'

Upvotes: 2

Views: 2623

Answers (3)

BENY
BENY

Reputation: 323376

Or Try this

df.index=pd.MultiIndex.from_tuples([(x[0], x[1].replace('X', 'Y')) for x in df.index])
df
Out[304]: 
             0         1         2
a aY -0.696181 -1.929523 -1.903956
  bY  0.071061 -0.594185 -2.005251
b cY -0.097761  0.093667  1.780550
  dY  0.127887  1.534395  0.352351

Upvotes: 3

Brad Solomon
Brad Solomon

Reputation: 40918

@cᴏʟᴅsᴘᴇᴇᴅ improved on my answer so I will leave just a slower alternate here...

import numpy as np
df = pd.DataFrame(np.random.randn(4,3), 
                  index=[list('aabb'), [n + 'X' for n in list('abcd')]])

Here's an alternate method using reset_index. This would be applicable if you wanted to replace in more than one column. The trick is that you can't use replace on the Index so you have to "bring it into" the DataFrame.

new = (df.reset_index()
           .select_dtypes(include=['object'])
           .apply(lambda col: col.str.replace('X', 'Y')))

df.index = pd.MultiIndex.from_tuples(new.values.tolist())

Upvotes: 3

cs95
cs95

Reputation: 403120

You're doing more than you need to.

df 
                  0     1     2
Index0 Index1                  
A      BX        .2    .3  0.90
       CX       .34   .55  0.54
D      EX       .34   .44  0.32
       FX      .43.  .88.  0.06

Use pd.MultiIndex.from_arrays and you can do this in one step.

df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
                                       df.index.levels[1].str.replace('X', 'Y')])

df
                  0     1     2
Index0 Index1                  
A      BY        .2    .3  0.90
       CY       .34   .55  0.54
D      EY       .34   .44  0.32
       FY      .43.  .88.  0.06

Performance

%%timeit
new = (df.reset_index()
            .select_dtypes(include=['object'])
            .apply(lambda col: col.str.replace('X', 'Y')))

df.index = pd.MultiIndex.from_tuples(new.values.tolist())

10 loops, best of 3: 93.5 ms per loop

Nearly 100ms for a tiny dataframe. Contrast with:

%%timeit
df.index = pd.MultiIndex.from_arrays([df.index.get_level_values(0),
                                        df.index.levels[1].str.replace('X', 'Y')])

1000 loops, best of 3: 934 µs per loop

Upvotes: 2

Related Questions