rpb
rpb

Reputation: 3299

Sorting multiple columns based on list in Pandas

Appreciate for any hint on how to sort a given multiple columns based on multiples list in pandas as below

import pandas as pd
sort_a=['a','d','e']
sort_b=['s1','s3','s6']
sort_c=['t1','t2','t3']
df=pd.DataFrame(zip(['a', 'e', 'd','a','a','d','e'], ['s3', 's1', 's6','s6','s3','s3','s1'], ['t3', 't2', 't1','t2','t2','t3','t3']),columns=["a", "b", "c"])

with the ordering column, say, a,c,d

The idea is something like the sort

df.sort(['a', 'c', 'd'], ascending = [sort_a, sort_c, sort_b])

Expected output

a   b   c
a   s3  t2
a   s3  t3
a   s6  t2
d   s3  t3
d   s6  t1
e   s1  t2
e   s1  t3

Upvotes: 2

Views: 264

Answers (2)

rpb
rpb

Reputation: 3299

While not directly related to sort with reference to a list, but the key question when posting this OP as I wanted to sort columns with string in it.

Using sort_values for pandas >= 1.1.0

With the new key argument in DataFrame.sort_values, since pandas 1.1.0, we can directly sort a column without setting it as an index using natsort.natsort_keygen:

from natsort import natsort_keygen
df=df.sort_values(
    by=['a','b','c'],
    key=natsort_keygen()
)

Output:

   a   b   c
4  a  s3  t2
0  a  s3  t3
3  a  s6  t2
5  d  s3  t3
2  d  s6  t1
1  e  s1  t2
6  e  s1  t3

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28729

One option would be to create categories for each column and sort:

categories = {col : pd.CategoricalDtype(categories=cat, ordered=True) 
              for col, cat 
              in zip(df.columns, [sort_a, sort_b, sort_c])}

df.astype(categories).sort_values([*df.columns])

   a   b   c
0  a  s3  t3
2  d  s6  t1
1  e  s1  t2

Upvotes: 3

Related Questions