Sort DataFrame by occurrence in one column, while preserving order in other columns

Question

I would like to sort DataFrame in a similar fashion to this SO question: Sorting entire csv by frequency of occurence in one column

However, one issue I'm encountering is that the count is not guaranteed to be unique and in that case rows will be interleaved (I'm using the method suggested by EdChum in the above question)

Given the following DataFrame:

cluster_id,distance,url
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com

After I would like it to be:

cluster_id,distance,url
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com

Note that columns cluster_id and distance are still in order, after sorting by occurrence of "cluster_id"

BENY · Accepted Answer

We can sort by cluster_id and new column'G':

df.assign(G=df.groupby('cluster_id').cluster_id.transform('count')).sort_values(['G','cluster_id'],ascending=[False,True]).drop('G',1)
Out[248]: 
   cluster_id  distance      url
4           7      0.10  abc.com
5           7      0.20  def.com
6           7      0.30  xyz.com
0           1      0.15  aaa.com
1           1      0.25  bbb.com
2           2      0.05  ccc.com
3           2      0.10  ccc.com

Sort DataFrame by occurrence in one column, while preserving order in other columns

Answers (2)

Related Questions