skgbanga
skgbanga

Reputation: 2667

pandas groupby apply returning a dataframe

Consider the following code:

>>> df = pd.DataFrame(np.random.randint(0, 4, 16).reshape(4, 4), columns=list('ABCD'))
... df
...
   A  B  C  D
0  2  1  0  2
1  3  0  2  2
2  0  2  0  2
3  2  1  2  0
>>> def grouper(frame):
...     return frame
...     
... df.groupby('A').apply(grouper)
...
   A  B  C  D
0  2  1  0  2
1  3  0  2  2
2  0  2  0  2
3  2  1  2  0

As you can see, the results are identical. Here is the documentation of apply:

The function passed to apply must take a dataframe as its first argument and return a DataFrame, Series or scalar. apply will then take care of combining the results back together into a single dataframe or series. apply is therefore a highly flexible grouping method.

Groupby will divide group into small dataframes like this:

   A  B  C  D
2  0  2  0  2

   A  B  C  D
0  2  1  0  2
3  2  1  2  0

   A  B  C  D
1  3  0  2  2

apply documentation says that it combines the dataframes back into a single dataframe. I am curious how it combined them in a way that the final result is the same as the original dataframe. If it had used concat, the final dataframe would have been equal to:

   A  B  C  D
2  0  2  0  2
0  2  1  0  2
3  2  1  2  0
1  3  0  2  2

I am curious how this concatenation has been done.

Upvotes: 2

Views: 1167

Answers (1)

keiv.fly
keiv.fly

Reputation: 4005

If you look at the source code you will see that there is a parameter not_indexed_same that checks if the index remains the same after groupby. If it is the same then groupby does reindexing of the dataframe before returning results. I do not know why this was implemented.

The change was made on Aug 21, 2011 and Wes made no comments on the change: https://github.com/pandas-dev/pandas/commit/00c8da0208553c37ca6df0197da431515df813b7#diff-720d374f1a709d0075a1f0a02445cd65

Upvotes: 4

Related Questions