Reputation: 6662
I have a dataframe of states + DC. They should be ordered by name, but with DISTRICT OF COLUMBIA
coming first. Not-in-place method-chaining operations are preferred.
The following works great, and is in the chaining style I prefer. But it seems way too complicated for such a simple operation. Is it possible to do this in a cleaner way?
I start with
>>> states = pd.DataFrame({
'state_name': ['ALABAMA', 'DISTRICT OF COLUMBIA', 'WYOMING',],
'population': [1000, 2000, 3000]
})
>>> states
population state_name
0 1000 ALABAMA
1 2000 DISTRICT OF COLUMBIA
2 3000 WYOMING
and do
>>> (
states
.assign(state_name = lambda x: x.state_name.astype('category', ordered=True))
.assign(state_name = lambda x:x.state_name.cat.reorder_categories(
['DISTRICT OF COLUMBIA']
+ x.state_name.cat.categories.drop('DISTRICT OF COLUMBIA').tolist())
)
.sort_values('state_name')
)
to get
population state_name
1 2000 DISTRICT OF COLUMBIA
0 1000 ALABAMA
2 3000 WYOMING
Upvotes: 1
Views: 74
Reputation: 294488
Here's what you do:
states.state_name.ne('DISTRICT OF COLUMBIA')
. This will be False
for 'DISTRICT OF COLUMBIA'
and True
for everything else.False
comes first and all the True
come after. If we use a stable sort, then those True
values will stay in the same order. mergesort
is a stable sort.iloc
with argsort
to get the permutation that represents that sort.Lot of words to describe this:
states.iloc[states.state_name.ne('DISTRICT OF COLUMBIA').argsort(kind='mergesort')]
population state_name
1 2000 DISTRICT OF COLUMBIA
0 1000 ALABAMA
2 3000 WYOMING
You could also add a column to use in sort_values
states.eval(
'dc = state_name != "DISTRICT OF COLUMBIA"', inplace=False
).sort_values('dc', kind='mergesort').drop('dc', 1)
population state_name
1 2000 DISTRICT OF COLUMBIA
0 1000 ALABAMA
2 3000 WYOMING
Upvotes: 1