Reputation: 20909
I would like to sort DataFrame in a similar fashion to this SO question: Sorting entire csv by frequency of occurence in one column
However, one issue I'm encountering is that the count is not guaranteed to be unique and in that case rows will be interleaved (I'm using the method suggested by EdChum in the above question)
Given the following DataFrame:
cluster_id,distance,url
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com
After I would like it to be:
cluster_id,distance,url
7,0.1,abc.com
7,0.2,def.com
7,0.3,xyz.com
1,0.15,aaa.com
1,0.25,bbb.com
2,0.05,ccc.com
2,0.10,ccc.com
Note that columns cluster_id and distance are still in order, after sorting by occurrence of "cluster_id"
Upvotes: 0
Views: 1382
Reputation: 141
` pno dn
0 A AA
1 B BB
2 A AA ` to sort in ascending order
g.assign(G=g.groupby('dn').dn.transform('count')).sort_values(['G','dn'],ascending=[True,False]).drop('G',1)
pno dn
1 B BB
0 A AA
2 A AA
Upvotes: 0
Reputation: 323276
We can sort by cluster_id
and new column'G':
df.assign(G=df.groupby('cluster_id').cluster_id.transform('count')).sort_values(['G','cluster_id'],ascending=[False,True]).drop('G',1)
Out[248]:
cluster_id distance url
4 7 0.10 abc.com
5 7 0.20 def.com
6 7 0.30 xyz.com
0 1 0.15 aaa.com
1 1 0.25 bbb.com
2 2 0.05 ccc.com
3 2 0.10 ccc.com
Upvotes: 3