clwen
clwen

Reputation: 20909

Sort DataFrame by occurrence in one column, while preserving order in other columns

I would like to sort DataFrame in a similar fashion to this SO question: Sorting entire csv by frequency of occurence in one column

However, one issue I'm encountering is that the count is not guaranteed to be unique and in that case rows will be interleaved (I'm using the method suggested by EdChum in the above question)

Given the following DataFrame:

cluster_id,distance,url
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com

After I would like it to be:

cluster_id,distance,url
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com

Note that columns cluster_id and distance are still in order, after sorting by occurrence of "cluster_id"

Upvotes: 0

Views: 1382

Answers (2)

Amir
Amir

Reputation: 141

` pno dn

0 A AA

1 B BB

2 A AA ` to sort in ascending order

g.assign(G=g.groupby('dn').dn.transform('count')).sort_values(['G','dn'],ascending=[True,False]).drop('G',1)

pno dn

1 B BB

0 A AA

2 A AA

Upvotes: 0

BENY
BENY

Reputation: 323276

We can sort by cluster_id and new column'G':

df.assign(G=df.groupby('cluster_id').cluster_id.transform('count')).sort_values(['G','cluster_id'],ascending=[False,True]).drop('G',1)
Out[248]: 
   cluster_id  distance      url
4           7      0.10  abc.com
5           7      0.20  def.com
6           7      0.30  xyz.com
0           1      0.15  aaa.com
1           1      0.25  bbb.com
2           2      0.05  ccc.com
3           2      0.10  ccc.com

Upvotes: 3

Related Questions