jax
jax

Reputation: 4197

How to sort/ group a Pandas data frame by class label or any specific column

class col2 col3 col4 col5
1     4    5    5    5
4     4    4.5  5.5  6
1     3.5  5    6    4.5
3     3    4    4    4
2     3    3.5  3.8  6.1

I have used hypothetical data in the example. The shape of the real DataFrame is 6680x1900. I have clustered these data into 50 labeled classes (1 to 50). How can I sort this data in ascending order of class labels?

I have tried:

df.groupby([column_name_lst])["class"]

But it fails with this error:

TypeError: You have to supply one of 'by' and 'level'

How to solve this problem? Expected output is:

class col2 col3 col4 col5
1     4    5    5    5
1     3.5  5    6    4.5
2     3    3.5  3.8  6.1
3     3    4    4    4
4     4    4.5  5.5  6

Upvotes: 4

Views: 14700

Answers (2)

jezrael
jezrael

Reputation: 863291

I think you can use DataFrame.sort_values if class is Series:

print (type(df['class']))
<class 'pandas.core.series.Series'>


print (df.sort_values(by='class'))
   class  col2  col3  col4  col5
0      1   4.0   5.0   5.0   5.0
2      1   3.5   5.0   6.0   4.5
4      2   3.0   3.5   3.8   6.1
3      3   3.0   4.0   4.0   4.0
1      4   4.0   4.5   5.5   6.0

Also if need groupby, use parameter by:

print (df.groupby(by='class').sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

And if class is index, use Kartik solution:

print (df.index)
Int64Index([1, 4, 1, 3, 2], dtype='int64', name='class')

print (df.sort_index())
       col2  col3  col4  col5
class                        
1       4.0   5.0   5.0   5.0
1       3.5   5.0   6.0   4.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

Also if need groupby, use parameter level:

print (df.groupby(level='class').sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

or index, but first solution is better, because is more general:

print (df.groupby(df.index).sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

Upvotes: 4

Kartik
Kartik

Reputation: 8703

If you are starting with the data in your question:

class col2 col3 col4 col5
1     4    5    5    5
4     4    4.5  5.5  6
1     3.5  5    6    4.5
3     3    4    4     4
2     3   3.5   3.8   6.1

And want to sort that, then it depends on whether 'class' is an index or column. If index:

df.sort_index()

should give you the answer. If column, follow answer by @jezarael

Upvotes: 1

Related Questions