Reputation: 1117
let df be our test dataframe from Pandas:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
what i want to do now is actually to group by let's say column A
, something like:
df.groupby(['A'])['C'].sum()
that works fine. Now instead of using sum()
I want to apply a own function to summarise the data in an efficient way.
The equivalent in R would be:
require(plyr); require(dplyr)
df = data.frame(A = c('foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'),
B = c('one', 'one', 'two', 'three','two', 'two', 'one', 'three'),
C = rnorm(8),
D = rnorm(8))
with for example this function called myfun
:
myfun <- function(x){sum(x**2)}
then:
df %>%
group_by(A) %>%
summarise(result = myfun(C))
I hope the question was clear enough. Many thanks!
Upvotes: 2
Views: 995
Reputation: 29740
You could either use agg
and place your custom function in a lambda, e.g.
>>> df.groupby('A').C.agg(lambda x: x.pow(2).sum())
A
bar 3.787664
foo 2.448404
Name: C, dtype: float64
Or you could define it separately and pass it to agg
.
def sum2(x):
return x.pow(2).sum()
>>> df.groupby('A').C.agg(sum2)
A
bar 3.787664
foo 2.448404
Name: C, dtype: float64
Note also that agg
accepts lots of things for the function argument so it is fairly flexible. From the docs, the arg
function used for aggregating groups can at the moment be a:
Upvotes: 3