Reputation: 1971
Suppose I have the following dataframe
df = pd.DataFrame([
(2, 2, 'A', .5),
(2, 2, 'A', .6),
(2, 2, 'B', .75),
(2, 2, 'B', .7),
(2, 2, 'C', .6),
(2, 3, 'A', .65),
(2, 3, 'A', .6),
(2, 3, 'B', .75),
(2, 3, 'B', .7),
(2, 3, 'C', .6)
], columns=['out_size', 'problem_size', 'algo', 'time'])
I want to
algo
, thenalgo
that has the lowest average time in that group,result
pd.DataFrame(
[[2, 2, 'A', 0.55],
[2, 3, 'C', 0.6]], columns=['out_size', 'problem_size', 'algo', 'time'])
Upvotes: 1
Views: 78
Reputation: 765
df.groupby(['out_size', 'problem_size']).apply(lambda dd:dd.groupby('algo')['time'].mean().nsmallest(1)).reset_index()
output:
out_size problem_size algo time
0 2 2 A 0.55
1 2 3 C 0.60
Upvotes: 0
Reputation: 13251
cols = ['out_size', 'problem_size', 'algo']
out = df.groupby(cols)['time'].mean().unstack('algo').agg(['idxmin', 'min'], axis=1).reset_index()
out.rename(columns={'idxmin':'algo', 'min':'time'}, inplace=True)
print(out)
Output:
out_size problem_size algo time
0 2 2 A 0.55
1 2 3 C 0.6
Upvotes: 1
Reputation: 71
You can do in one line by - grouping values, sorting & grouping again:
df1 = df.groupby(['out_size', 'problem_size', 'algo'])['time'].mean().reset_index()
df1 = df1.sort_values(['out_size', 'problem_size', 'time'])
df1 = df1.groupby(['out_size', 'problem_size']).head(1)
df1
Output:
out_size problem_size algo time
2 2 A 0.55
2 3 C 0.60
Upvotes: 1
Reputation: 14238
You can do - first groupby the three columns and then agg by mean and then sort ascending and then you want to take only the first in the groups, so drop duplicates based on out_size and problem_size and keeping only the first which will guarantee you get the lowest average algo because it was already sorted.
df.groupby(['out_size', 'problem_size', 'algo'],as_index=False).agg(
{'time':'mean'}).sort_values(
by='time',ignore_index=True).drop_duplicates(
subset=['out_size', 'problem_size'], keep='first', ignore_index=True)
Or if you have more columns that all need average agg, then
agg_cols = ['time', #someother columns]
groupby_cols = df.columns.drop(agg_cols).tolist()
g = df.groupby(groupby_cols,as_index=False).agg('mean')
g.sort_values(by='time',ignore_index=True).drop_duplicates(
subset=['out_size','problem_size'],ignore_index=True, keep='first')
output:
out_size problem_size algo time
0 2 2 A 0.55
1 2 3 C 0.60
Upvotes: 1
Reputation: 793
You can do this with two groupby
s:
group_cols = ["out_size", "problem_size", "algo"]
result = df.groupby(group_cols)["time"].mean().reset_index(drop=False)
computes the mean time for each group. Then you can sort the time in ascending order:
result = result.sort_values("time", ascending=True)
and then group again (without column algo
) and take the first element of each group:
result = result.groupby(["out_size", "problem_size"])[["algo","time"]].first().reset_index(drop=False)
Upvotes: 1
Reputation: 262549
You can use a double groupby
:
cols = ['out_size', 'problem_size', 'algo']
out = (df
.groupby(cols, as_index=False)['time'].mean()
.sort_values(by='time')
.groupby(cols[:-1], as_index=False).first()
)
Slightly more efficient alternative that doesn't require to sort the values (but requires to store an intermediate):
cols = ['out_size', 'problem_size', 'algo']
out = df.groupby(cols)['time'].mean()
out = out.loc[out.groupby(cols[:-1]).idxmin()].reset_index()
output:
out_size problem_size algo time
0 2 2 A 0.55
1 2 3 C 0.60
Upvotes: 2