Reputation: 551
I was asking myself whether it is possible to unstack one level of a multiindex dataframe such that the remaining indexes of the returned dataframe are not sorted! Code example:
arrays = [["room1", "room1", "room1", "room1", "room1", "room1",
"room2", "room2", "room2", "room2", "room2", "room2"],
["bed1", "bed1", "bed1", "bed2", "bed2", "bed2",
"bed1", "bed1", "bed1", "bed2", "bed2", "bed2"],
["blankets", "pillows", "all", "blankets", "pillows", "all",
"blankets", "pillows", "all", "blankets", "pillows", "all"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first index',
'second index', 'third index'])
series = pd.Series([1, 2, 3, 1, 1, 2, 2, 2, 4, 2, 1, 3 ], index=index)
series
first index second index third index
room1 bed1 blankets 1
pillows 2
all 3
bed2 blankets 1
pillows 1
all 2
room2 bed1 blankets 2
pillows 2
all 4
bed2 blankets 2
pillows 1
all 3
Unstacking the second index:
series.unstack(1)
second index bed1 bed2
first index third index
room1 all 3 2
blankets 1 1
pillows 2 1
room2 all 4 3
blankets 2 2
pillows 2 1
The problem is that the order of the third index has changed since the index was sorted automatically and alphabetically. Now, the line 'all' which is the sum of the lines 'blankets' and 'pillow' is the first line instead of the last line. So how can this be fixed? There does not seem to be an option that stops 'unstack' from sorting automatically. Also, there does not seem to be a possibility to sort the index of a dataframe by a key like myDataFrame.sort_index(..., key=['some_key']).
Upvotes: 3
Views: 774
Reputation: 863226
One possible solution is reindex
or reindex_axis
with parameter level=1
:
s = series.unstack(1).reindex(['blankets','pillows','all'], level=1)
print (s)
second index bed1 bed2
first index third index
room1 blankets 1 1
pillows 2 1
all 3 2
room2 blankets 2 2
pillows 2 1
all 4 3
s = series.unstack(1).reindex_axis(['blankets','pillows','all'], level=1)
print (s)
second index bed1 bed2
first index third index
room1 blankets 1 1
pillows 2 1
all 3 2
room2 blankets 2 2
pillows 2 1
all 4 3
More dynamic solution:
a = series.index.get_level_values('third index').unique()
print (a)
Index(['blankets', 'pillows', 'all'], dtype='object', name='third index')
s = series.unstack(1).reindex_axis(a, level=1)
print (s)
second index bed1 bed2
first index third index
room1 blankets 1 1
pillows 2 1
all 3 2
room2 blankets 2 2
pillows 2 1
all 4 3
Upvotes: 3