Reputation: 4705
df2
is the only thing that can be used here - using df
or df1
would be to use data that isn't possible. The data is received as df2
, it wants to be manipulated to the form of df1
. Neither df1
or df
can be used as part of of the solution (as df1
is the solution).
this is just setup for this post.
# sample data
reps = {1: "dog", 2: "ant", 3: "cat", 6: "orange", 7: "apple", 8: "grape"}
df = pd.DataFrame(
{"one": [1, 1, 1, 2, 2, 2, 3, 3, 3], "two": [6, 7, 8, 6, 7, 8, 6, 7, 8]}
)
df = df.replace(reps).copy()
df1 = df.copy()
df2 = df.sample(frac=1, random_state=1).replace(reps).reset_index(drop=True)
Sort df1
so that it is in the same order as df1
.
df2
:
one two
0 cat grape
1 dog grape
2 cat orange
3 cat apple
4 dog apple
5 dog orange
6 ant apple
7 ant orange
8 ant grape
df1
one two
0 dog orange
1 dog apple
2 dog grape
3 ant orange
4 ant apple
5 ant grape
6 cat orange
7 cat apple
8 cat grape
You can not use df1
as part of a solution, or df
, the data is df2
, and
it needs to be sorted into the ordering of df1
.
I have tried with pd.Categorical
, but couldn't make something work.
order_one = ["dog", "ant", "cat"]
order_two = ["orange", "apple", "grape"]
df2 = (
df2.groupby(["two"])
.apply(lambda a: a.iloc[pd.Categorical(a["one"], order_one).argsort()])
.reset_index(drop=True)
)
df2 = (
df2.groupby(["one"])
.apply(lambda a: a.iloc[pd.Categorical(a["two"], order_two).argsort()])
.reset_index(drop=True)
)
The solution should be based purely on df2
, df1
is just part of the test data and to demonstrate how df2
should be sorted. A solution that uses df1
is not viable as this is the result of sorting df2
, I can't use that as part of a solution
Upvotes: 2
Views: 136
Reputation: 104
Technically you are not sorting, since the order is not ascending or descending or alphabetical. You want to order df_2 with some user-specified ordering. You can do this by generating a numerical index based on your custom order, then sorting by that.
order_one = ["dog", "ant", "cat"]
order_two = ["orange", "apple", "grape"]
# Create dictionaries that define order
order_map_one = dict(zip(order_one, range(len(order_one))))
order_map_two = dict(zip(order_two, range(len(order_two))))
# Generate a temp column that maps numerical rank onto column values
df_2['order_one_rank'] = df['one'].map(order_map_one)
df_2['order_two_rank'] = df['two'].map(order_map_two)
# Sort by these temp columns
df_2.sort_values(['order_one_rank', 'order_two_rank'], inplace=True)
# Then delete the temp columns to recover the original df_2
df_2.drop('order_one_rank', 1, inplace=True)
df_2.drop('order_two_rank', 1, inplace=True)
That should leave df_2 in the order you are looking for.
Upvotes: 1
Reputation: 323276
Let us try pd.Categorical
df2.one=pd.Categorical(df2.one,categories=df1.one.unique())
df2.two=pd.Categorical(df2.two,categories=df1.two.unique())
df2=df2.sort_values(['one','two'])
df2
one two
5 dog orange
4 dog apple
1 dog grape
7 ant orange
6 ant apple
8 ant grape
2 cat orange
3 cat apple
0 cat grape
Make it into function
def yourfunc(x,y):
... for c in x.columns :
... x[c]=pd.Categorical(x[c],categories=y[c].unique())
... return x.sort_values(x.columns.tolist())
...
yourfunc(df1,df2)
one two
8 cat grape
6 cat orange
7 cat apple
2 dog grape
0 dog orange
1 dog apple
5 ant grape
3 ant orange
4 ant apple
Update
order_fruit = ["orange", "apple", "grape"]
order_animals = ["dog", "ant", "cat"]
def yourfunc(x,y):
... for c, self in zip(x.columns,y) :
... x[c]=pd.Categorical(x[c],categories=self)
... return x.sort_values(x.columns.tolist())
...
yourfunc(df2,[order_animals,order_fruit])
one two
5 dog orange
4 dog apple
1 dog grape
7 ant orange
6 ant apple
8 ant grape
2 cat orange
3 cat apple
0 cat grape
Upvotes: 2