Reputation: 5060
I'm looking for a way to sort pandas DataFrame. pd.DataFrame.sort_values
doesn't accept a key function, pre-1.0. I can convert it to list and apply a key to sorted
function, but that will be slow. The other way seems something related to categorical index. I don't have a fixed number of rows so I don't know if categorical index will be applicable.
I have given an example case of what kind of data I want to sort:
Input DataFrame:
clouds fluff
0 {[} 1
1 >>> 2
2 {1 3
3 123 4
4 AAsda 5
5 aad 6
Output DataFrame:
clouds fluff
0 >>> 2
1 {[} 1
2 {1 3
3 123 4
4 aad 6
5 AAsda 5
The rule for sorting (priority):
First special characters (sort by ascii among themselves)
Next is by numbers
next is by lower case alphabets (lexicographical)
next is Capital case alphabets (lexicographical)
In plain python I'd do it like
from functools import cmp_to_key
def ks(a, b):
# "Not exactly this but similar"
if a.isupper():
return -1
else:
return 1
Case
sorted(['aa', 'AA', 'dd', 'DD'], key=cmp_to_key(ks))
Answer:
['DD', 'AA', 'aa', 'dd']
How would you do it with Pandas?
Upvotes: 11
Views: 27421
Reputation: 81
As of pandas 1.2.0, I did this
import numpy as np
import pandas as pd
df = pd.DataFrame(['aa', 'dd', 'DD', 'AA'], columns=["data"])
# This is the sorting rule
rule = {
"DD": 1,
"AA": 10,
"aa": 20,
"dd": 30,
}
def particular_sort(series):
"""
Must return one Series
"""
return series.apply(lambda x: rule.get(x, 1000))
new_df = df.sort_values(by=["data"], key=particular_sort)
print(new_df) # DD, AA, aa, dd
Of course, you can do this too, but it may be difficult to understand,smile
new_df = df.sort_values(by=["data"], key=lambda x: x.apply(lambda y: rule.get(y, 1000)))
print(new_df) # DD, AA, aa, dd
Upvotes: 8
Reputation: 5060
As of pandas 1.1.0, pandas.DataFrame.sort_values
accepts an argument key
with type callable.
So in this case we would use:
df.sort_values(by='clouds', key=kf)
where kf
is the key function that operates on type Series. Accepts and returns Series.
Upvotes: 8
Reputation: 13485
This seems to work:
def sort_dataframe_by_key(dataframe: DataFrame, column: str, key: Callable) -> DataFrame:
""" Sort a dataframe from a column using the key """
sort_ixs = sorted(np.arange(len(dataframe)), key=lambda i: key(dataframe.iloc[i][column]))
return DataFrame(columns=list(dataframe), data=dataframe.iloc[sort_ixs].values)
It passes tests:
def test_sort_dataframe_by_key():
dataframe = DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}])
assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: x).equals(
DataFrame([{'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}, {'a': 3, 'b': 4, 'c': 0}]))
assert sort_dataframe_by_key(dataframe, column='a', key=lambda x: -x).equals(
DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))
assert sort_dataframe_by_key(dataframe, column='b', key=lambda x: -x).equals(
DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 1, 'b': 2, 'c': 3}, {'a': 2, 'b': 1, 'c': 1}]))
assert sort_dataframe_by_key(dataframe, column='c', key=lambda x: x).equals(
DataFrame([{'a': 3, 'b': 4, 'c': 0}, {'a': 2, 'b': 1, 'c': 1}, {'a': 1, 'b': 2, 'c': 3}]))
Upvotes: 2
Reputation: 4506
This might be useful, yet still not sure about special characters! can they actally be sorted!!
import pandas as pd
a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']
df = pd.DataFrame({"a": a})
df['upper'] = df['a'].str.isupper()
df['lower'] = df['a'].str.islower()
df['int'] = df['a'].apply(isinstance,args = [int])
df2 = pd.concat([df[df['int'] == True].sort_values(by=['a']),
df[df['lower'] == True].sort_values(by=['a']),
df[df['upper'] == True].sort_values(by=['a'])])
print(df2)
a upper lower int
3 1 NaN NaN True
0 2 NaN NaN True
6 3 NaN NaN True
4 a False True False
5 b False True False
2 c False True False
8 A True False False
1 B True False False
7 C True False False
you can also do it in one step with creating new True False columns!
a = [2, 'B', 'c', 1, 'a', 'b',3, 'C', 'A']
df = pd.DataFrame({"a": a})
df2 = pd.concat([df[df['a'].apply(isinstance,args = [int])].sort_values(by=['a']),
df[df['a'].str.islower() == True].sort_values(by=['a']),
df[df['a'].str.isupper() == True].sort_values(by=['a'])])
a
3 1
0 2
6 3
4 a
5 b
2 c
8 A
1 B
7 C
Upvotes: 2