Reputation: 26027
I do have a dataframe like this:
df = pd.DataFrame({
'IDs': list('abcdefgh'),
'Val': [
'foo', 'bar', 'foo', 'abc', 'bar', 'bar', 'foo', 'foo'
]
})
IDs Val
0 a foo
1 b bar
2 c foo
3 d abc
4 e bar
5 f bar
6 g foo
7 h foo
I now want to get an output like this:
Val IDs
foo a
c
g
h
bar b
e
f
abc d
So, it is the index of a multiindex dataframe which is ordered accoring to the size
of each group in Val
.
I currently do it like this:
df['groupsize'] = df.groupby('Val')['IDs'].transform('size')
df = (
df.sort_values(['groupsize', 'Val', 'IDs'], ascending=[False, True, True])
.drop('groupsize', axis=1)
.set_index(['Val', 'IDs'])
)
df.to_excel('example.xlsx', merge_cells=True)
which gives the desired output.
Is there a way to achieve the same output but without creating this intermediate column groupsize
which is dropped later anyway?
Upvotes: 2
Views: 199
Reputation: 25269
Use set_index
and value_counts
df.set_index('Val').loc[df.Val.value_counts().index]
Out[44]:
IDs
Val
foo a
foo c
foo g
foo h
bar b
bar e
bar f
abc d
If you need multiindex, just chain addition set_index
with append=True
df.set_index('Val').loc[df.Val.value_counts().index].set_index('IDs', append=True)
Upvotes: 1
Reputation: 59274
You can use np.argsort
and iloc
to avoid the verbose sort_values
s = np.argsort(-df.groupby('Val')['IDs'].transform('size'))
df.iloc[s].set_index(['Val', 'IDs'])
Val IDs
foo a
c
g
h
bar b
e
f
abc d
Upvotes: 1