Reputation: 4705

How to sort a pandas dataframe on two (or more) different columns, in a particular order

Note

df2 is the only thing that can be used here - using df or df1 would be to use data that isn't possible. The data is received as df2, it wants to be manipulated to the form of df1. Neither df1 or df can be used as part of of the solution (as df1 is the solution).

Setup test data

this is just setup for this post.

# sample data
reps = {1: "dog", 2: "ant", 3: "cat", 6: "orange", 7: "apple", 8: "grape"}
df = pd.DataFrame(
    {"one": [1, 1, 1, 2, 2, 2, 3, 3, 3], "two": [6, 7, 8, 6, 7, 8, 6, 7, 8]}
)
df = df.replace(reps).copy()
df1 = df.copy()
df2 = df.sample(frac=1, random_state=1).replace(reps).reset_index(drop=True)

Problem

Sort df1 so that it is in the same order as df1.

df2:

   one     two
0  cat   grape
1  dog   grape
2  cat  orange
3  cat   apple
4  dog   apple
5  dog  orange
6  ant   apple
7  ant  orange
8  ant   grape

df1

   one     two
0  dog  orange
1  dog   apple
2  dog   grape
3  ant  orange
4  ant   apple
5  ant   grape
6  cat  orange
7  cat   apple
8  cat   grape

Conditions

You can not use df1 as part of a solution, or df, the data is df2, and it needs to be sorted into the ordering of df1.

Attempt

I have tried with pd.Categorical, but couldn't make something work.

order_one = ["dog", "ant", "cat"]
order_two = ["orange", "apple", "grape"]

df2 = (
    df2.groupby(["two"])
    .apply(lambda a: a.iloc[pd.Categorical(a["one"], order_one).argsort()])
    .reset_index(drop=True)
)

df2 = (
    df2.groupby(["one"])
    .apply(lambda a: a.iloc[pd.Categorical(a["two"], order_two).argsort()])
    .reset_index(drop=True)
)

Edit

The solution should be based purely on df2, df1 is just part of the test data and to demonstrate how df2 should be sorted. A solution that uses df1 is not viable as this is the result of sorting df2, I can't use that as part of a solution

Upvotes: 2

Answers (2)

E.J. White

Reputation: 104

Technically you are not sorting, since the order is not ascending or descending or alphabetical. You want to order df_2 with some user-specified ordering. You can do this by generating a numerical index based on your custom order, then sorting by that.

order_one = ["dog", "ant", "cat"]
order_two = ["orange", "apple", "grape"]

# Create dictionaries that define order
order_map_one = dict(zip(order_one, range(len(order_one))))
order_map_two = dict(zip(order_two, range(len(order_two))))

# Generate a temp column that maps numerical rank onto column values
df_2['order_one_rank'] = df['one'].map(order_map_one)
df_2['order_two_rank'] = df['two'].map(order_map_two)

# Sort by these temp columns
df_2.sort_values(['order_one_rank', 'order_two_rank'], inplace=True)

# Then delete the temp columns to recover the original df_2
df_2.drop('order_one_rank', 1, inplace=True)
df_2.drop('order_two_rank', 1, inplace=True)

That should leave df_2 in the order you are looking for.

Upvotes: 1

BENY

Reputation: 323276

Let us try pd.Categorical

df2.one=pd.Categorical(df2.one,categories=df1.one.unique())
df2.two=pd.Categorical(df2.two,categories=df1.two.unique())
df2=df2.sort_values(['one','two'])
df2
   one     two
5  dog  orange
4  dog   apple
1  dog   grape
7  ant  orange
6  ant   apple
8  ant   grape
2  cat  orange
3  cat   apple
0  cat   grape

Make it into function

def yourfunc(x,y):
...     for c in x.columns : 
...         x[c]=pd.Categorical(x[c],categories=y[c].unique())
...     return x.sort_values(x.columns.tolist())
... 
yourfunc(df1,df2)
   one     two
8  cat   grape
6  cat  orange
7  cat   apple
2  dog   grape
0  dog  orange
1  dog   apple
5  ant   grape
3  ant  orange
4  ant   apple

Update

order_fruit = ["orange", "apple", "grape"]
order_animals = ["dog", "ant", "cat"]
def yourfunc(x,y):
...      for c, self in zip(x.columns,y) : 
...          x[c]=pd.Categorical(x[c],categories=self)
...      return x.sort_values(x.columns.tolist())
... 
yourfunc(df2,[order_animals,order_fruit])
   one     two
5  dog  orange
4  dog   apple
1  dog   grape
7  ant  orange
6  ant   apple
8  ant   grape
2  cat  orange
3  cat   apple
0  cat   grape