Bubblethan
Bubblethan

Reputation: 379

Dataframe groupby produce a group twice

I'm using groupby in pandas dataframe, but it unexpectly produces a dataframe:

new = pd.DataFrame([1,2,3,4,5,6,7,8])
import numpy as np
new.groupby(np.arange(len(new))//2).transform(lambda x: print(x, type(x), '***'))

The outcome is

0    1
1    2
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
   0
0  1
1  2 <class 'pandas.core.frame.DataFrame'> ***
2    3
3    4
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
4    5
5    6
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
6    7
7    8
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***

But my expectation is

0    1
1    2
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
2    3
3    4
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
4    5
5    6
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
6    7
7    8
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***

Where does that dataframe object come from?

Upvotes: 1

Views: 44

Answers (2)

meph
meph

Reputation: 682

This worked for me:

new.groupby(np.arange(len(new))//2)[0].transform(lambda x: print(x,'****'))

Notice the [0].

The output will be:

0    1
1    2
Name: 0, dtype: int64 **** (2,)
2    3
3    4
Name: 1, dtype: int64 **** (2,)
4    5
5    6
Name: 2, dtype: int64 **** (2,)
6    7
7    8
Name: 3, dtype: int64 **** (2,)
8     9
9    10
Name: 4, dtype: int64 **** (2,)
10    11
11    12
Name: 5, dtype: int64 **** (2,)

Upvotes: 0

filbranden
filbranden

Reputation: 8898

See the docs for apply(), which include this section:

Notes:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

You called transform(), but that is just another version of apply(). Also, it turns out this caveat exists for both DataFrame and GroupBy objects.

That explains why you're seeing the first group twice, first as a Series and then as a DataFrame.

Upvotes: 1

Related Questions