Ordering a dataframe by each column

Question

I have a dataframe that looks like this:

   ID  Age  Score
0   9    5      3
1   4    6      1
2   9    7      2
3   3    2      1
4  12    1     15
5   2   25      6
6   9    5      4
7   9    5     61
8   4    2     12

I want to sort based on the first column, then the second column, and so on.

So I want my output to be this:

   ID  Age  Score
5   2   25      6
3   3    2      1
8   4    2     12
1   4    6      1
0   9    5      3
6   9    5      4
7   9    5     61
2   9    7      2
4  12    1     15

I know I can do the above with df.sort_values(df.columns.to_list()), however I'm worried this might be quite slow for much larger dataframes (in terms of columns and rows).

Is there a more optimal solution?

mozway · Accepted Answer

You can use numpy.lexsort to improve performance.

import numpy as np

a = df.to_numpy()
out = pd.DataFrame(a[np.lexsort(np.rot90(a))],
                   index=df.index, columns=df.columns)

Assuming as input a random square DataFrame of side n:

df = pd.DataFrame(np.random.randint(0, 100, size=(n, n)))

here is the comparison for 100 to 100M items (slower runtime is the best):

Same graph with the speed relative to pandas

Ordering a dataframe by each column

Answers (2)

Related Questions