Rosa Alejandra
Rosa Alejandra

Reputation: 732

Sort multiIndex python pandas dataframe by the order they appear on another data frame

I have a data frame df like this:

                 a   b
 id   no   name   
T01  101   foo   1  $10
T32  102   bar   2  $30
T10  103   baz   4  $25

where the index are id, no and name. I have another data frame df2 with the index order I wish

    no
0  103
1  101
2  102

I need the data frame to be

                 a   b
 id   no   name  
T10  103   baz   4  $25 
T01  101   foo   1  $10
T32  102   bar   2  $30

I've seen to use df.loc[df2.no.values] and df.reindex(df2.no) but since I have multi indexed dataFrame it doesn't seem to work

What should I use to sort df in the key no with the order in df2?

Upvotes: 1

Views: 363

Answers (2)

jezrael
jezrael

Reputation: 863291

One possible solution with reset_index, reindex and last set_index:

print df1.reset_index(level=['id','name'])
         .reindex(df2.no)
         .reset_index()
         .set_index(['id','no','name'])

              a    b
id  no  name        
T10 103 baz   4  $25
T01 101 foo   1  $10
T32 102 bar   2  $30

If order of levels is not important:

print df1.reset_index(level=['id','name'])
         .reindex(df2.no)
         .set_index(['id','name'], append=True)

              a    b
no  id  name        
103 T10 baz   4  $25
101 T01 foo   1  $10
102 T32 bar   2  $30

TIMEING:

In [77]: %timeit df1.unstack([0, 2]).ix[df2.no].stack([1, 2]).swaplevel(0, 1)
10 loops, best of 3: 18.8 ms per loop

In [78]: %timeit df1.reset_index(level=['id','name']).reindex(df2.no).reset_index().set_index(['id','no','name'])
The slowest run took 4.41 times longer than the fastest. This could mean that an intermediate result is being cached 
100 loops, best of 3: 4.41 ms per loop

Upvotes: 1

piRSquared
piRSquared

Reputation: 294498

Solution

df.unstack([0, 2]).ix[df2.no].stack([1, 2]).swaplevel(0, 1)

Explanation

unstack([0, 2]) puts 1st and 3rd levels of index into [-2, -1] levels of columns. This isolates the level you care about.

ix[df2.no] orders the remaining level in the order you prefer.

stack([1, 2]) grabs levels from columns and puts them back into the index.

swaplevel(0, 1) puts the index levels back in the original order.

Upvotes: 0

Related Questions