Sort underscore before upercase letters

Question

I have a DataFrame with UPPERCASE column names from a database that I'd like to sort. Some of them are prefixed with an underscore _. The default sorting puts them at the end. Do I have to convert them to lowercase, then sort, then convert them back to uppercase or is there an easier way?

Here's some sample data. When the values of cl2 are lowercase then the order is correct. With uppercase not so much.

pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"])

mozway · Accepted Answer

If conversion to lowercase works for you, you can use it as a key:

pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]}
).sort_values(["cl1", "cl2"], key=lambda s: s.str.lower())

Other option, sort by the strings without _, then the presence of _:

(pd.DataFrame({
   "cl1": ["foo", "foo", "foo"], 
   "cl2": ["_FOO", "BAR", "BAZ"]})
   .sort_values(["cl1", "cl2"], key=lambda s: s.str.lstrip('_'))
   .sort_values(["cl1", "cl2"], key=lambda s: ~s.str.startswith('_'), kind='stable')
)

output:

   cl1   cl2
0  foo  _FOO
1  foo   BAR
2  foo   BAZ

Sort underscore before upercase letters

Answers (2)

Related Questions