Reputation: 30026
I would like to use groupby
on my dataframe and then chain a series of function calls on each group with apply
.
As a first prototype, I've set up an example where I convert the entries of my dataframe from string to numeric. The dataframe looks like this:
frame = pd.DataFrame({
"number": ["1", "2", "3", "4", "5", "6", "7", "8"],
"type": ["a",] * 4 + ["b",] * 4})
The resulting dataframe is:
The numbers in this dataframe are strings. So before I can use any math operations, they have to be converted to a numerical type. That's what I would like to do with apply:
frame.groupby("type")["number"].apply(pd.to_numeric)
But the result is a single series which contains all items:
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
Name: number, dtype: int64
I've read the docs for this. Apparently you can use transform
or apply
.
In the samples, the grouped structure seems to be kept.
Maybe it is something related to pd.to_numeric
? So I tried:
frame.groupby("type")["number"].apply(lambda x: int(x))
Which results in a TypeError:
TypeError: cannot convert the series to
Apparently the apply gets a whole group as parameter. The results for each group seem to be concatenated into one dataframe.
Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.
A related question I've found is this: pandas: sample groups after groupby
But the answer suggests to apply the function before the grouping. Which doesn't work well with chaining the functions. And not at all for something like mean()
.
Upvotes: 3
Views: 763
Reputation: 434
The messages and behaviors you are getting here are because you are in fact calling :
pd.core.groupby.SeriesGroupBy.apply(self, func, *args, **kwargs)
and not Series.apply
or DataFrame.apply
.
But the result is a single series which contains all items:
It seems to correspond with case #3 described here.
Apparently the apply gets a whole group as parameter.
Yes
The results for each group seem to be concatenated into one dataframe.
Depends on the case linked above
Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.
You would have to give more details on what you are trying to achieve but aggregate
or transform
seem good candidates indeed
Upvotes: 1