Reputation: 6615
I'm looking for understanding of how to do aggregates in pandas when I pass in several columns to the aggregate function. I'm used to dplyr in R where this is mega simple...
In my example, 'data' has many columns, including 'TPR', 'FPR', and 'model'. There are many different datasets concatenated together, and I need to run my function at the 'model' grouped level.
grouped_data = data.groupby(['model'])
grouped_data.aggregate( sklearn.metrics.auc(x='FPR',y='TPR') )
However, this results in an error.
Upvotes: 2
Views: 1451
Reputation: 18221
As you only want to apply a single method, you can use apply
instead of aggregate
. The argument has to be a Python callable to be applied to each of the groups, so in your case that would look like
data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))
For example:
y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, _ = sklearn.metrics.roc_curve(y, pred, pos_label=2)
df_a = pd.DataFrame({'model': 'a', 'FPR': fpr, 'TPR': tpr})
df_b = pd.DataFrame({'model': 'b', 'FPR': fpr, 'TPR': tpr})
data = df_a.append(df_b)
data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))
Output:
model
a 0.75
b 0.75
dtype: float64
Upvotes: 4