Reorder dataframe rows by custom ordering rule

Question

I have a dataframe of states + DC. They should be ordered by name, but with DISTRICT OF COLUMBIA coming first. Not-in-place method-chaining operations are preferred.

The following works great, and is in the chaining style I prefer. But it seems way too complicated for such a simple operation. Is it possible to do this in a cleaner way?

I start with

>>> states = pd.DataFrame({
 'state_name': ['ALABAMA', 'DISTRICT OF COLUMBIA', 'WYOMING',], 
 'population': [1000, 2000, 3000]
 })


>>> states
   population            state_name
0        1000               ALABAMA
1        2000  DISTRICT OF COLUMBIA
2        3000               WYOMING

and do

>>> (
     states
    .assign(state_name = lambda x: x.state_name.astype('category', ordered=True))
    .assign(state_name = lambda x:x.state_name.cat.reorder_categories(
        ['DISTRICT OF COLUMBIA']
        +  x.state_name.cat.categories.drop('DISTRICT OF COLUMBIA').tolist())
    )
    .sort_values('state_name')
)

to get

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

piRSquared · Accepted Answer

Here's what you do:

Create a boolean series states.state_name.ne('DISTRICT OF COLUMBIA'). This will be False for 'DISTRICT OF COLUMBIA' and True for everything else.
If we sort this boolean series, the False comes first and all the True come after. If we use a stable sort, then those True values will stay in the same order. mergesort is a stable sort.
However, we need to use iloc with argsort to get the permutation that represents that sort.

Lot of words to describe this:

states.iloc[states.state_name.ne('DISTRICT OF COLUMBIA').argsort(kind='mergesort')]

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

You could also add a column to use in sort_values

states.eval(
    'dc = state_name != "DISTRICT OF COLUMBIA"', inplace=False
).sort_values('dc', kind='mergesort').drop('dc', 1)

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

Reorder dataframe rows by custom ordering rule

Answers (1)

Related Questions