Hatshepsut
Hatshepsut

Reputation: 6662

Reorder dataframe rows by custom ordering rule

I have a dataframe of states + DC. They should be ordered by name, but with DISTRICT OF COLUMBIA coming first. Not-in-place method-chaining operations are preferred.

The following works great, and is in the chaining style I prefer. But it seems way too complicated for such a simple operation. Is it possible to do this in a cleaner way?

I start with

>>> states = pd.DataFrame({
 'state_name': ['ALABAMA', 'DISTRICT OF COLUMBIA', 'WYOMING',], 
 'population': [1000, 2000, 3000]
 })


>>> states
   population            state_name
0        1000               ALABAMA
1        2000  DISTRICT OF COLUMBIA
2        3000               WYOMING

and do

>>> (
     states
    .assign(state_name = lambda x: x.state_name.astype('category', ordered=True))
    .assign(state_name = lambda x:x.state_name.cat.reorder_categories(
        ['DISTRICT OF COLUMBIA']
        +  x.state_name.cat.categories.drop('DISTRICT OF COLUMBIA').tolist())
    )
    .sort_values('state_name')
)

to get

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

Upvotes: 1

Views: 74

Answers (1)

piRSquared
piRSquared

Reputation: 294488

Here's what you do:

  • Create a boolean series states.state_name.ne('DISTRICT OF COLUMBIA'). This will be False for 'DISTRICT OF COLUMBIA' and True for everything else.
  • If we sort this boolean series, the False comes first and all the True come after. If we use a stable sort, then those True values will stay in the same order. mergesort is a stable sort.
  • However, we need to use iloc with argsort to get the permutation that represents that sort.

Lot of words to describe this:

states.iloc[states.state_name.ne('DISTRICT OF COLUMBIA').argsort(kind='mergesort')]

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

You could also add a column to use in sort_values

states.eval(
    'dc = state_name != "DISTRICT OF COLUMBIA"', inplace=False
).sort_values('dc', kind='mergesort').drop('dc', 1)

   population            state_name
1        2000  DISTRICT OF COLUMBIA
0        1000               ALABAMA
2        3000               WYOMING

Upvotes: 1

Related Questions