Reputation: 379
I'm using groupby in pandas dataframe, but it unexpectly produces a dataframe:
new = pd.DataFrame([1,2,3,4,5,6,7,8])
import numpy as np
new.groupby(np.arange(len(new))//2).transform(lambda x: print(x, type(x), '***'))
The outcome is
0 1
1 2
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
0
0 1
1 2 <class 'pandas.core.frame.DataFrame'> ***
2 3
3 4
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
4 5
5 6
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
6 7
7 8
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
But my expectation is
0 1
1 2
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
2 3
3 4
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
4 5
5 6
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
6 7
7 8
Name: 0, dtype: int64 <class 'pandas.core.series.Series'> ***
Where does that dataframe object come from?
Upvotes: 1
Views: 44
Reputation: 682
This worked for me:
new.groupby(np.arange(len(new))//2)[0].transform(lambda x: print(x,'****'))
Notice the [0].
The output will be:
0 1
1 2
Name: 0, dtype: int64 **** (2,)
2 3
3 4
Name: 1, dtype: int64 **** (2,)
4 5
5 6
Name: 2, dtype: int64 **** (2,)
6 7
7 8
Name: 3, dtype: int64 **** (2,)
8 9
9 10
Name: 4, dtype: int64 **** (2,)
10 11
11 12
Name: 5, dtype: int64 **** (2,)
Upvotes: 0
Reputation: 8898
See the docs for apply()
, which include this section:
Notes:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
You called transform()
, but that is just another version of apply()
. Also, it turns out this caveat exists for both DataFrame and GroupBy objects.
That explains why you're seeing the first group twice, first as a Series and then as a DataFrame.
Upvotes: 1