pandas groupby with both "mean" and list of rows

Question

I have a dataframe with dates, names and a numerical score. I'm trying to use pd.groupby to create a new df containing the average of the numerical over an hourly interval but also the list of all them names appearing in that hour:

What I have:

Dates	Name	Score
Hour 1	A	5
Hour 2	B	2
Hour 2	C	6
Hour 2	D	4

What I want:

Hours	Name	Score (Avg)
1	A	5
2	[B,C,D]	4

I know I can create each of them separately with

df.groupby([pd.Grouper(key="date",freq='60min')]).agg("mean")

or use the .apply(list) to get the names in lists

But I would like to do both. Any ideas? the data might be pretty big so preferably the solution is somewhat fast. Thanks!

jezrael · Accepted Answer

If need lists for each values, also one element lists use GroupBy.agg with names aggregations:

df1 = (df.groupby([pd.Grouper(key="date",freq='60min')])
         .agg(Name = ("Name",list), Avg = ("Score","mean")))

Or:

df1 = (df.groupby(df["date"].dt.hour)
         .agg(Name = ("Name",list), Avg = ("Score","mean")))


print (df1)
           Name  Avg
date                
1           [A]    5
2     [B, C, D]    4

If need scalar for one element lists need custom function:

df1 = (df.groupby(df["date"].dt.hour)
        .agg(Name = ("Name",lambda x: list(x) if len(x)>1 else x), Avg = ("Score","mean")))
print (df1)
           Name  Avg
date                
1             A    5
2     [B, C, D]    4

Last for Hours column use:

df1 = df1.reset_index().rename(columns={"date":"Hours"})

pandas groupby with both "mean" and list of rows

Answers (1)

Related Questions

pandas groupby with both &quot;mean&quot; and list of rows

Answers (1)

Related Questions

pandas groupby with both "mean" and list of rows