Reputation: 3030
Edit: I edited the example because the previous one could be interpreted in different ways.
I have a dataframe with row labels and in each column a sorting of the labels:
pd.DataFrame({'0': [3,1,2], '1': [2,3,1]}, index=['Red', 'Green', 'Blue'])
It looks like this (real data has more columns):
I want to transform it into a matrix with color names sorted according to the ranks in each column.
For example, the first column is [3, 2, 1]
and the result should be ['Blue', 'Green', 'Red']
.
The second column is [2, 3, 1]
and the result should be ['Blue', 'Red', 'Green']
.
The numbers are the rank of each label in that column. (They are not indices into the labels array.) So if 'Red' has 2, it means it should be in the second cell in the column.
Upvotes: 1
Views: 189
Reputation: 863301
Use Series.sort_values
per columns in DataFrame.apply
:
df1 = df.apply(lambda x: x.sort_values().index)
print (df1)
0 1
Red Blue Blue
Green Green Red
Blue Red Green
Upvotes: 3
Reputation: 3030
I found a solution. I am sure there are better solutions:
# Construct the dataframe
df = pd.DataFrame({'0': [3,1,2], '1': [2,3,1]}, index=['Red', 'Green', 'Blue'])
# Prepare an empty numpy array to hold the answer
a = np.empty([3, 2], dtype=np.dtype('U20'))
# Extract the index labels and duplicate them to match the shape of the answer
index = np.array([df.index.to_numpy(), ]* 2).transpose()
# Get the ranks as a numpy array
ranks = (df.astype('int') -1).to_numpy()
# Use the ranks as the indices in the target array for each label
np.put_along_axis(a, ranks, index, 0)
Upvotes: 0