Reputation: 93
I have a dataframe with dates, names and a numerical score. I'm trying to use pd.groupby to create a new df containing the average of the numerical over an hourly interval but also the list of all them names appearing in that hour:
What I have:
Dates | Name | Score |
---|---|---|
Hour 1 | A | 5 |
Hour 2 | B | 2 |
Hour 2 | C | 6 |
Hour 2 | D | 4 |
What I want:
Hours | Name | Score (Avg) |
---|---|---|
1 | A | 5 |
2 | [B,C,D] | 4 |
I know I can create each of them separately with
df.groupby([pd.Grouper(key="date",freq='60min')]).agg("mean")
or
use the .apply(list)
to get the names in lists
But I would like to do both. Any ideas? the data might be pretty big so preferably the solution is somewhat fast. Thanks!
Upvotes: 1
Views: 50
Reputation: 863541
If need lists for each values, also one element lists use GroupBy.agg
with names aggregations:
df1 = (df.groupby([pd.Grouper(key="date",freq='60min')])
.agg(Name = ("Name",list), Avg = ("Score","mean")))
Or:
df1 = (df.groupby(df["date"].dt.hour)
.agg(Name = ("Name",list), Avg = ("Score","mean")))
print (df1)
Name Avg
date
1 [A] 5
2 [B, C, D] 4
If need scalar for one element lists need custom function:
df1 = (df.groupby(df["date"].dt.hour)
.agg(Name = ("Name",lambda x: list(x) if len(x)>1 else x), Avg = ("Score","mean")))
print (df1)
Name Avg
date
1 A 5
2 [B, C, D] 4
Last for Hours
column use:
df1 = df1.reset_index().rename(columns={"date":"Hours"})
Upvotes: 1