Reputation: 18694
I have a DataFrame
with UPPERCASE
column names from a database that I'd like to sort. Some of them are prefixed with an underscore _
. The default sorting puts them at the end. Do I have to convert them to lowercase, then sort, then convert them back to uppercase or is there an easier way?
Here's some sample data. When the values of cl2
are lowercase then the order is correct. With uppercase not so much.
pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"])
Upvotes: 1
Views: 427
Reputation: 8904
I had similar peculiar results.
For me, I had a list of tuples I was trying to sort on a compound key.
I ended up using replace to remove underscores in the sort itself as follows:
dl = sorted(dl, key=lambda t: (t[0].replace('_',''),t[1]))
Otherwise I was getting odd results like:
coming before:
It's bizarre how Python sort works.
I tried both of the results here in conjunction with here to come up with this.
Upvotes: 1
Reputation: 261914
If conversion to lowercase works for you, you can use it as a key:
pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"], key=lambda s: s.str.lower())
Other option, sort by the strings without _
, then the presence of _
:
(pd.DataFrame({
"cl1": ["foo", "foo", "foo"],
"cl2": ["_FOO", "BAR", "BAZ"]})
.sort_values(["cl1", "cl2"], key=lambda s: s.str.lstrip('_'))
.sort_values(["cl1", "cl2"], key=lambda s: ~s.str.startswith('_'), kind='stable')
)
output:
cl1 cl2
0 foo _FOO
1 foo BAR
2 foo BAZ
Upvotes: 1