Jim
Jim

Reputation: 1869

Pandas apply sort_values on GroupBy object does not return a grouped DataFrame

I don't understand why the following code is not working. I have the following dataframe:

ind = pd.MultiIndex.from_tuples([(2, 9), (2, 0), (3, 15), (3, 8), (2, 28), (2, 15), (2, 10), (3, 9)], names=['A','B'])

values = [0.2719, 0.2938, 0.3281, 0.3310, 0.3323, 0.3640, 0.3647, 0.5218]

df = pd.DataFrame(data = values, index=ind, columns = ['values'])

enter image description here

applying a groupby sort_values doesn't do anything:

df.groupby('A').apply(lambda x: x.sort_values(by='values'))

enter image description here

Note that the values are already globally sorted.

Now when i just swap two rows, and thereby destroy the global prior sorting, then it suddenly works:

df1 = df.iloc[np.r_[1,0,2:len(df)]]
df1.groupby('A').apply(lambda x: x.sort_values(by='values'))

enter image description here

This is the result I would expect from the other code also.

Upvotes: 2

Views: 1008

Answers (1)

gherka
gherka

Reputation: 1446

It doesn't say a great deal about the combine part of the split-apply-combine in the docs:

GroupBy will examine the results of the apply step and try to return a sensibly combined result.

Since you're not changing the number of rows or their order in the first example, apply functions more like transform which returns a "like-indexed object".

I think if what you want is a nested sort, you can just pass a list to sort_values directly, like so:

df.sort_values(["A", "values"])
      values
A B         
2 9   0.2719
  0   0.2938
  28  0.3323
  15  0.3640
  10  0.3647
3 15  0.3281
  8   0.3310
  9   0.5218

Upvotes: 1

Related Questions