Reputation: 2850
My timeseries dataframe looks like below:
ts_ms a. b. c. flow. latency. duration
1614772770705. 10. 10. 4. 1 2 3
1614772770800. 10. 10. 2. 1 2 4
1614772770750. 10. 5. 4. 1 2 3
I need to create a 5Min bucket, then groupby a,
, b
, c
such that latency
is summed and duration
is weighted averaged on flow
What I have so far is
wm = lambda x: (x * df.loc[x.index, "flow"]).sum() / df.flow.sum()
def agg_func(df):
df.groupby(pd.Grouper(freq='5Min')).agg(latency_sum=("latency", "sum"), duration_weighted=("duration", wm))
#convert to datetimes
df['ts_date'] = pd.to_datetime(df['ts_ms'])
df.set_index('ts_date', inplace=True)
df1 = df.groupby(["a", "b", "c"]).apply(agg_func)
That does now work. I basically get an empty dataframe as df1
What am I missing? Please suggest.
EDIT
For clarity, the expected output dataframe should have below columns with some values ...
ts_date a. b. c. latency_sum duration_weighted
But I get an empty dataframe
df1.to_dict('records')
[]
Upvotes: 1
Views: 152
Reputation: 71570
You have to also return
:
wm = lambda x: (x * df.loc[x.index, "flow"]).sum() / df.flow.sum()
def agg_func(df):
return df.groupby(pd.Grouper(freq='5Min')).agg(latency_sum=("latency", "sum"), duration_weighted=("duration", wm))
#convert to datetimes
df['ts_date'] = pd.to_datetime(df['ts_ms'])
df.set_index('ts_date', inplace=True)
df1 = df.groupby(["a", "b", "c"]).apply(agg_func)
print(df1)
Output:
latency_sum duration_weighted
a b c ts_date
10 5 4 1970-01-01 00:25:00 2 1.000000
10 2 1970-01-01 00:25:00 2 1.333333
4 1970-01-01 00:25:00 2 1.000000
Upvotes: 1