python group by, passing in columns to aggregate function params

Question

I'm looking for understanding of how to do aggregates in pandas when I pass in several columns to the aggregate function. I'm used to dplyr in R where this is mega simple...

In my example, 'data' has many columns, including 'TPR', 'FPR', and 'model'. There are many different datasets concatenated together, and I need to run my function at the 'model' grouped level.

grouped_data = data.groupby(['model']) 
grouped_data.aggregate( sklearn.metrics.auc(x='FPR',y='TPR') )

However, this results in an error.

fuglede · Accepted Answer

As you only want to apply a single method, you can use apply instead of aggregate. The argument has to be a Python callable to be applied to each of the groups, so in your case that would look like

data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

For example:

y = np.array([1, 1, 2, 2])
pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, _ = sklearn.metrics.roc_curve(y, pred, pos_label=2)
df_a = pd.DataFrame({'model': 'a', 'FPR': fpr, 'TPR': tpr})
df_b = pd.DataFrame({'model': 'b', 'FPR': fpr, 'TPR': tpr})
data = df_a.append(df_b)
data.groupby('model').apply(lambda group: sklearn.metrics.auc(group.FPR, group.TPR))

Output:

model
a    0.75
b    0.75
dtype: float64

python group by, passing in columns to aggregate function params

Answers (1)

Related Questions