Pete
Pete

Reputation: 514

Order columns of DataFrame according to values

I have the following input:

col1    col2    col3
1       4       0
0       12      2
2       12      4
3       2       1 

I want to sort the DataFrame according to the values in the columns, e.g. sorting it primarily for df[df==0].count() and secondarily for df.sum() would produce the output:

col2    col3    col1
4       0       1
12      2       0
12      4       2
2       1       3 

pd.DataFrame.sort() takes a colum object as argument, which does not apply here, so how can I achieve this?

Upvotes: 1

Views: 1461

Answers (1)

JoeCondron
JoeCondron

Reputation: 8906

Firstly, I think your zero count is increasing from right to left whereas your sum is decreasing, so I think you need to clarify that. You can get the number of zero rows simply by (df == 0).sum().

To sort by a single aggregate, you can do something like:

col_order = (df == 0).sum().sort(inplace=False).index
df[col_order]

This sorts the series of aggregates by its values and the resulting index is the columns of df in the order you want. To sort on two sets of values would be more awkward/tricky but you could do something like

aggs = pd.DataFrame({'zero_count': (df == 0).sum(), 'sum': df.sum()})
col_order = aggs.sort(['zero_count', 'sum'], inplace=False).index
df[col_order]

Note that the sort method takes an ascending parameter which takes either a Boolean or a list of Booleans of equal length to the number of columns you are sorting on, e.g.

df.sort(['a', 'b', ascending=[True, False])

Upvotes: 3

Related Questions