daramasala
daramasala

Reputation: 3030

Create dataframe with labels sorted according to data

Edit: I edited the example because the previous one could be interpreted in different ways.

I have a dataframe with row labels and in each column a sorting of the labels:

pd.DataFrame({'0': [3,1,2], '1': [2,3,1]}, index=['Red', 'Green', 'Blue'])

It looks like this (real data has more columns):

d

I want to transform it into a matrix with color names sorted according to the ranks in each column.

For example, the first column is [3, 2, 1] and the result should be ['Blue', 'Green', 'Red'].

The second column is [2, 3, 1] and the result should be ['Blue', 'Red', 'Green'].

The numbers are the rank of each label in that column. (They are not indices into the labels array.) So if 'Red' has 2, it means it should be in the second cell in the column.

Upvotes: 1

Views: 189

Answers (2)

jezrael
jezrael

Reputation: 863301

Use Series.sort_values per columns in DataFrame.apply:

df1 = df.apply(lambda x: x.sort_values().index)
print (df1)
           0      1
Red     Blue   Blue
Green  Green    Red
Blue     Red  Green

Upvotes: 3

daramasala
daramasala

Reputation: 3030

I found a solution. I am sure there are better solutions:

# Construct the dataframe
df = pd.DataFrame({'0': [3,1,2], '1': [2,3,1]}, index=['Red', 'Green', 'Blue'])
# Prepare an empty numpy array to hold the answer
a = np.empty([3, 2], dtype=np.dtype('U20'))
# Extract the index labels and duplicate them to match the shape of the answer
index = np.array([df.index.to_numpy(), ]* 2).transpose()
# Get the ranks as a numpy array
ranks = (df.astype('int') -1).to_numpy()
# Use the ranks as the indices in the target array for each label 
np.put_along_axis(a, ranks, index, 0)

Upvotes: 0

Related Questions