Reputation: 3299
Appreciate for any hint on how to sort a given multiple columns based on multiples list in pandas as below
import pandas as pd
sort_a=['a','d','e']
sort_b=['s1','s3','s6']
sort_c=['t1','t2','t3']
df=pd.DataFrame(zip(['a', 'e', 'd','a','a','d','e'], ['s3', 's1', 's6','s6','s3','s3','s1'], ['t3', 't2', 't1','t2','t2','t3','t3']),columns=["a", "b", "c"])
with the ordering column, say, a
,c
,d
The idea is something like the sort
df.sort(['a', 'c', 'd'], ascending = [sort_a, sort_c, sort_b])
Expected output
a b c
a s3 t2
a s3 t3
a s6 t2
d s3 t3
d s6 t1
e s1 t2
e s1 t3
Upvotes: 2
Views: 264
Reputation: 3299
While not directly related to sort with reference to a list, but the key question when posting this OP as I wanted to sort columns with string
in it.
Using sort_values for pandas >= 1.1.0
With the new key argument in DataFrame.sort_values, since pandas 1.1.0, we can directly sort a column without setting it as an index using natsort.natsort_keygen:
from natsort import natsort_keygen
df=df.sort_values(
by=['a','b','c'],
key=natsort_keygen()
)
Output:
a b c
4 a s3 t2
0 a s3 t3
3 a s6 t2
5 d s3 t3
2 d s6 t1
1 e s1 t2
6 e s1 t3
Upvotes: 1
Reputation: 28729
One option would be to create categories for each column and sort:
categories = {col : pd.CategoricalDtype(categories=cat, ordered=True)
for col, cat
in zip(df.columns, [sort_a, sort_b, sort_c])}
df.astype(categories).sort_values([*df.columns])
a b c
0 a s3 t3
2 d s6 t1
1 e s1 t2
Upvotes: 3