lhk
lhk

Reputation: 30026

pandas, keep groupby groups after apply

I would like to use groupby on my dataframe and then chain a series of function calls on each group with apply.

As a first prototype, I've set up an example where I convert the entries of my dataframe from string to numeric. The dataframe looks like this:

frame = pd.DataFrame({
    "number": ["1", "2", "3", "4", "5", "6", "7", "8"], 
    "type": ["a",] * 4 + ["b",] * 4})

The resulting dataframe is:

structure of the dataframe

The numbers in this dataframe are strings. So before I can use any math operations, they have to be converted to a numerical type. That's what I would like to do with apply:

frame.groupby("type")["number"].apply(pd.to_numeric)

But the result is a single series which contains all items:

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
Name: number, dtype: int64

I've read the docs for this. Apparently you can use transform or apply. In the samples, the grouped structure seems to be kept.

Maybe it is something related to pd.to_numeric ? So I tried:

frame.groupby("type")["number"].apply(lambda x: int(x))

Which results in a TypeError:

TypeError: cannot convert the series to

Apparently the apply gets a whole group as parameter. The results for each group seem to be concatenated into one dataframe.

Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.

A related question I've found is this: pandas: sample groups after groupby

But the answer suggests to apply the function before the grouping. Which doesn't work well with chaining the functions. And not at all for something like mean().

Upvotes: 3

Views: 763

Answers (1)

Phik
Phik

Reputation: 434

The messages and behaviors you are getting here are because you are in fact calling : pd.core.groupby.SeriesGroupBy.apply(self, func, *args, **kwargs) and not Series.apply or DataFrame.apply.

But the result is a single series which contains all items:

It seems to correspond with case #3 described here.

Apparently the apply gets a whole group as parameter.

Yes

The results for each group seem to be concatenated into one dataframe.

Depends on the case linked above

Is it possible to use apply in a way that keeps the grouped structure ? I would like a call that applies the function to each column within the groups and keeps the groups. Then I could chain the calls.

You would have to give more details on what you are trying to achieve but aggregate or transform seem good candidates indeed

Upvotes: 1

Related Questions