Reputation: 4716
I have a DataFrame on which I run:
df.groupby(by="mycol").agg({"colA": "sum", "colB": "count"})
However, colA
and colB
need to exist. What is the most "pandaic" approach to creating new columns from an aggregation?
Edit:
Basically, I have a set of columns and my aggregations are not a 1:1 mapping. Thus, consider an example where I would want to aggregate the ratio of two columns' values as a new column. Now think of a dictionary of such mappings.
I know that, in the example, I could just filter for play
and then compute the mean on the grouped data. But that's not the point of the question, so please ignore this simple solution which is just a side effect of the simple example.
>> df
outlook play temperature
0 sunny True 25
1 sunny True 25
2 overcast True 19
3 rain False 21
4 overcast False 33
5 rain False 27
6 rain False 22
7 overcast True 26
8 sunny True 13
9 sunny True 16
# should become:
>> df.groupby(by="outlook").agg(?)
play_mean_temp
sunny 19.75
overcast 22.50
rain NaN
Upvotes: 1
Views: 962
Reputation: 863166
In your sample you can use:
print (df.groupby(by="outlook").apply(lambda x: x.ix[x.play, 'temperature'].mean()))
outlook
overcast 22.50
rain NaN
sunny 19.75
If use firstly boolean indexing, some rows are omit:
print (df[df.play].groupby(by="outlook")['temperature'].mean())
outlook
overcast 22.50
sunny 19.75
Name: temperature, dtype: float64
Upvotes: 1