Xiphias
Xiphias

Reputation: 4716

How do I generate new columns using pandas groupby & aggregate?

I have a DataFrame on which I run:

df.groupby(by="mycol").agg({"colA": "sum", "colB": "count"})

However, colA and colB need to exist. What is the most "pandaic" approach to creating new columns from an aggregation?

Edit:

Basically, I have a set of columns and my aggregations are not a 1:1 mapping. Thus, consider an example where I would want to aggregate the ratio of two columns' values as a new column. Now think of a dictionary of such mappings.

I know that, in the example, I could just filter for play and then compute the mean on the grouped data. But that's not the point of the question, so please ignore this simple solution which is just a side effect of the simple example.

>> df
    outlook   play  temperature
0     sunny   True           25
1     sunny   True           25
2  overcast   True           19
3      rain  False           21
4  overcast  False           33
5      rain  False           27
6      rain  False           22
7  overcast   True           26
8     sunny   True           13
9     sunny   True           16

# should become:
>> df.groupby(by="outlook").agg(?)
         play_mean_temp
sunny    19.75
overcast 22.50
rain     NaN

Upvotes: 1

Views: 962

Answers (1)

jezrael
jezrael

Reputation: 863166

In your sample you can use:

print (df.groupby(by="outlook").apply(lambda x: x.ix[x.play, 'temperature'].mean()))
outlook
overcast    22.50
rain          NaN
sunny       19.75

If use firstly boolean indexing, some rows are omit:

print (df[df.play].groupby(by="outlook")['temperature'].mean())
outlook
overcast    22.50
sunny       19.75
Name: temperature, dtype: float64

Upvotes: 1

Related Questions