Create dataframe with labels sorted according to data

Question

Edit: I edited the example because the previous one could be interpreted in different ways.

I have a dataframe with row labels and in each column a sorting of the labels:

pd.DataFrame({'0': [3,1,2], '1': [2,3,1]}, index=['Red', 'Green', 'Blue'])

It looks like this (real data has more columns):

I want to transform it into a matrix with color names sorted according to the ranks in each column.

For example, the first column is [3, 2, 1] and the result should be ['Blue', 'Green', 'Red'].

The second column is [2, 3, 1] and the result should be ['Blue', 'Red', 'Green'].

The numbers are the rank of each label in that column. (They are not indices into the labels array.) So if 'Red' has 2, it means it should be in the second cell in the column.

jezrael · Accepted Answer

Use Series.sort_values per columns in DataFrame.apply:

df1 = df.apply(lambda x: x.sort_values().index)
print (df1)
           0      1
Red     Blue   Blue
Green  Green    Red
Blue     Red  Green

Create dataframe with labels sorted according to data

Answers (2)

Related Questions