Jh123
Jh123

Reputation: 93

pandas groupby with both "mean" and list of rows

I have a dataframe with dates, names and a numerical score. I'm trying to use pd.groupby to create a new df containing the average of the numerical over an hourly interval but also the list of all them names appearing in that hour:

What I have:

Dates Name Score
Hour 1 A 5
Hour 2 B 2
Hour 2 C 6
Hour 2 D 4

What I want:

Hours Name Score (Avg)
1 A 5
2 [B,C,D] 4

I know I can create each of them separately with

df.groupby([pd.Grouper(key="date",freq='60min')]).agg("mean")

or use the .apply(list) to get the names in lists

But I would like to do both. Any ideas? the data might be pretty big so preferably the solution is somewhat fast. Thanks!

Upvotes: 1

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 863541

If need lists for each values, also one element lists use GroupBy.agg with names aggregations:

df1 = (df.groupby([pd.Grouper(key="date",freq='60min')])
         .agg(Name = ("Name",list), Avg = ("Score","mean")))

Or:

df1 = (df.groupby(df["date"].dt.hour)
         .agg(Name = ("Name",list), Avg = ("Score","mean")))


print (df1)
           Name  Avg
date                
1           [A]    5
2     [B, C, D]    4

If need scalar for one element lists need custom function:

df1 = (df.groupby(df["date"].dt.hour)
        .agg(Name = ("Name",lambda x: list(x) if len(x)>1 else x), Avg = ("Score","mean")))
print (df1)
           Name  Avg
date                
1             A    5
2     [B, C, D]    4

Last for Hours column use:

df1 = df1.reset_index().rename(columns={"date":"Hours"})

Upvotes: 1

Related Questions