Fluxy
Fluxy

Reputation: 2978

How to sort values in DataFrame for both numerical and string values?

I have the following pandas DataFrame with mixed data types: string and integer values. I want to sort values of this DataFrame in descending order using multiple columns: Price and Name. The string values (i.e. Name) should be sorted in the alphabetical order, or actually can be ignored at all, because the most important ones are numerical values.

The problem is that the list of target columns can contain both string and integer columns, e.g. target_columns = ["Price","Name"]

d = {'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]}

columns=['Price', 'Name', 'Class']

target_columns = ['Price', 'Name']
order_per_cols = [False] * len(target_columns)

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns
df.sort_values(list(target_columns), ascending=order_per_cols, inplace=True)

Currently, this code fails with the following message:

TypeError: '<' not supported between instances of 'str' and 'int'

The expected output:

Price    Name    Class
300      DDD     2
100      AAA     3
30       DDD     3
30       BBB     3
25       AAA     2
5        CCC     2

Upvotes: 3

Views: 4969

Answers (2)

Akash Singh
Akash Singh

Reputation: 1

One more solution could be -

Using 'by' parameter in sort_values function

d = ({'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]})

df = pd.DataFrame.from_dict(data=d,columns=['Price','Name','Class'],orient='index')
df['Price'] = pd.to_numeric(df['Price'])
df.sort_values(**by** = ['Price','Name'],ascending=False)

Upvotes: 0

Erfan
Erfan

Reputation: 42916

If I understand you correctly, you want a generic way that excludes the object columns from your selection.

We can use DataFrame.select_dtypes for this, then sort on the numeric columns:

# df['Price'] = pd.to_numeric(df['Price'])
numeric = df[target_columns].select_dtypes('number').columns.tolist()
df = df.sort_values(numeric, ascending=[False]*len(numeric))
   Price Name  Class
4    300  DDD      2
6    100  AAA      3
2     30  BBB      3
5     30  DDD      3
1     25  AAA      2
3      5  CCC      2

Upvotes: 1

Related Questions