Reputation: 435
I've a pandas dataframe with columns, department and employee_count. I need to sort the employee_count column in descending order. But if there is a tie between 2 employee_counts then they should be sorted alphabetically based on department.
Department Employee_Count
0 abc 10
1 adc 10
2 bca 11
3 cde 9
4 xyz 15
Required output:
Department Employee_Count
0 xyz 15
1 bca 11
2 abc 10
3 adc 10
4 cde 9
This is what I've tried.
df = df.sort_values(['Department','Employee_Count'],ascending=[True,False])
But this just sorts the departments alphabetically.
I've also tried to sort by Department first and then by Employee_Count. Like this:
df = df.sort_values(['Department'],ascending=[True])
df = df.sort_values(['Employee_Count'],ascending=[False])
This doesn't give me correct output either:
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10
0 abc 10
3 cde 9
It gives 'adc' first and then 'abc'.
Upvotes: 13
Views: 21202
Reputation: 23061
It's a good 4 years too late but OP's initial attempt would work if the sorting was stable. Pandas sort_values()
uses 'quicksort'
by default which is not guaranteed to be stable. However, if the second sort_values()
call used a 'stable'
sort, it would produce the expected output.
df = df.sort_values('Department', ascending=True)
df = df.sort_values('Employee_Count', kind='stable', ascending=False, ignore_index=True)
# ^^^^^^^^^^^^^ <--- stable sort here
You can verify that for any dataframe, it produces the same result as sorting by a list of two columns:
df = pd.DataFrame(np.random.randint(10, size=(1000, 2)), columns=['A', 'B'])
a = df.sort_values('A', ascending=True).sort_values('B', kind='stable', ascending=False, ignore_index=True)
b = df.sort_values(['B', 'A'], ascending=[False, True], ignore_index=True)
a.equals(b) # True
Upvotes: 0
Reputation: 862511
You can swap columns in list and also values in ascending
parameter:
Explanation:
Order of columns names is order of sorting, first sort descending by Employee_Count
and if some duplicates in Employee_Count
then sorting by Department
only duplicates rows ascending.
df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
Department Employee_Count
4 xyz 15
2 bca 11
0 abc 10 <-
1 adc 10 <-
3 cde 9
Or for test if use second False
then duplicated rows are sorting descending
:
df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10 <-
0 abc 10 <-
3 cde 9
Upvotes: 18