John
John

Reputation: 551

Pandas unstack should not sort remaining indexes

I was asking myself whether it is possible to unstack one level of a multiindex dataframe such that the remaining indexes of the returned dataframe are not sorted! Code example:

arrays = [["room1", "room1", "room1", "room1", "room1", "room1",
           "room2", "room2", "room2", "room2", "room2", "room2"],
          ["bed1", "bed1", "bed1", "bed2", "bed2", "bed2",
           "bed1", "bed1", "bed1", "bed2", "bed2", "bed2"],
          ["blankets", "pillows", "all", "blankets", "pillows", "all",
           "blankets", "pillows", "all", "blankets", "pillows", "all"]]

tuples = list(zip(*arrays))

index = pd.MultiIndex.from_tuples(tuples, names=['first index', 
                                                 'second index', 'third index'])

series = pd.Series([1, 2, 3, 1, 1, 2, 2, 2, 4, 2, 1, 3 ], index=index)

series

first index  second index  third index
room1        bed1          blankets       1
                           pillows        2
                           all            3
             bed2          blankets       1
                           pillows        1
                           all            2
room2        bed1          blankets       2
                           pillows        2
                           all            4
             bed2          blankets       2
                           pillows        1
                           all            3

Unstacking the second index:

series.unstack(1)

second index             bed1  bed2
first index third index            
room1       all             3     2
            blankets        1     1
            pillows         2     1
room2       all             4     3
            blankets        2     2
            pillows         2     1

The problem is that the order of the third index has changed since the index was sorted automatically and alphabetically. Now, the line 'all' which is the sum of the lines 'blankets' and 'pillow' is the first line instead of the last line. So how can this be fixed? There does not seem to be an option that stops 'unstack' from sorting automatically. Also, there does not seem to be a possibility to sort the index of a dataframe by a key like myDataFrame.sort_index(..., key=['some_key']).

Upvotes: 3

Views: 774

Answers (1)

jezrael
jezrael

Reputation: 863226

One possible solution is reindex or reindex_axis with parameter level=1:

s = series.unstack(1).reindex(['blankets','pillows','all'], level=1)
print (s)
second index             bed1  bed2
first index third index            
room1       blankets        1     1
            pillows         2     1
            all             3     2
room2       blankets        2     2
            pillows         2     1
            all             4     3

s = series.unstack(1).reindex_axis(['blankets','pillows','all'], level=1)
print (s)
second index             bed1  bed2
first index third index            
room1       blankets        1     1
            pillows         2     1
            all             3     2
room2       blankets        2     2
            pillows         2     1
            all             4     3

More dynamic solution:

a = series.index.get_level_values('third index').unique()
print (a)
Index(['blankets', 'pillows', 'all'], dtype='object', name='third index')

s = series.unstack(1).reindex_axis(a, level=1)
print (s)
second index             bed1  bed2
first index third index            
room1       blankets        1     1
            pillows         2     1
            all             3     2
room2       blankets        2     2
            pillows         2     1
            all             4     3

Upvotes: 3

Related Questions