Reputation: 732
I have a data frame df
like this:
a b
id no name
T01 101 foo 1 $10
T32 102 bar 2 $30
T10 103 baz 4 $25
where the index are id, no and name
. I have another data frame df2
with the index order I wish
no
0 103
1 101
2 102
I need the data frame to be
a b
id no name
T10 103 baz 4 $25
T01 101 foo 1 $10
T32 102 bar 2 $30
I've seen to use df.loc[df2.no.values]
and df.reindex(df2.no)
but since I have multi indexed dataFrame it doesn't seem to work
What should I use to sort df
in the key no
with the order in df2
?
Upvotes: 1
Views: 363
Reputation: 863291
One possible solution with reset_index
, reindex
and last set_index
:
print df1.reset_index(level=['id','name'])
.reindex(df2.no)
.reset_index()
.set_index(['id','no','name'])
a b
id no name
T10 103 baz 4 $25
T01 101 foo 1 $10
T32 102 bar 2 $30
If order of levels is not important:
print df1.reset_index(level=['id','name'])
.reindex(df2.no)
.set_index(['id','name'], append=True)
a b
no id name
103 T10 baz 4 $25
101 T01 foo 1 $10
102 T32 bar 2 $30
TIMEING:
In [77]: %timeit df1.unstack([0, 2]).ix[df2.no].stack([1, 2]).swaplevel(0, 1)
10 loops, best of 3: 18.8 ms per loop
In [78]: %timeit df1.reset_index(level=['id','name']).reindex(df2.no).reset_index().set_index(['id','no','name'])
The slowest run took 4.41 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 4.41 ms per loop
Upvotes: 1
Reputation: 294498
df.unstack([0, 2]).ix[df2.no].stack([1, 2]).swaplevel(0, 1)
unstack([0, 2])
puts 1st and 3rd levels of index into [-2, -1] levels of columns. This isolates the level you care about.
ix[df2.no]
orders the remaining level in the order you prefer.
stack([1, 2])
grabs levels from columns and puts them back into the index.
swaplevel(0, 1)
puts the index levels back in the original order.
Upvotes: 0