Reputation: 543
I have to elaborate simple statistics grouping a dataframe by one column, for instance day_of_the_week, and minutes ranges, for instance 15 minutess, without keeping the dates. In other words, I need statistics on what is happening in each interval of 15 minutes on all sundays, mondays, etc. not split by date. The starting dataframe is something like this
datetime high low Day_of_the_week HL_delta
2021-08-01 22:00:00 4403.00 4395.25 6.0 7.75
2021-08-01 22:15:00 4404.00 4401.00 6.0 3.00
2021-08-01 22:30:00 4409.00 4403.25 6.0 5.75
2021-08-01 22:45:00 4408.25 4406.25 6.0 2.00
2021-08-01 23:00:00 4408.25 4405.75 6.0 2.5
where datetime is the index of the dataframe and it is a DateTime type.
I need to calculate the mean and max value for each distinguished day of the week of HL_delta grouped by 15 minute ranges over 1 year data.
I have tried something like this
df_statistics['HL_mean'] = df_data_for_statistics.groupby([df_data_for_statistics['day_of_the_week'], pd.Grouper(freq=statistics_period_minutes_formatted,closed='left',label='left')]).agg({ "HL_delta": "mean"})
df_statistics['HL_max'] = df_data_for_statistics.groupby([df_data_for_statistics['day_of_the_week'], pd.Grouper(freq=statistics_period_minutes_formatted,closed='left',label='left')]).agg({ "HL_delta": "max"})
but what i get is not an aggregation on all the distinguished weekdays of the year, the aggregation is applied on group of 15 minutes of each date, not each monday, Tuesday, Wednesday,.... The statistics shall answer to the questions: "which is the max values of HL_delta between the time 00:00 and 00:15 of all the Mondays of the year", "which is the max values of HL_delta between the time 00:16 and 00:30 of all the Mondays of the year", ..., "which is the max values of HL_delta between the time 00:00 and 00:15 of all the Fridays of the year", ... etc. Instead what I get by this attempt is this
high low day_of_the_week HL_delta
datetime
2021-08-01 22:00:00 4403.00 4395.25 6.0 7.75
2021-08-01 22:15:00 4404.00 4401.00 6.0 3.00
2021-08-01 22:30:00 4409.00 4403.25 6.0 5.75
2021-08-01 22:45:00 4408.25 4406.25 6.0 2.00
2021-08-01 23:00:00 4408.25 4405.75 6.0 2.50
... ... ... ... ...
2022-03-21 22:45:00 4453.50 4451.50 0.0 2.00
2022-03-21 23:00:00 4452.25 4449.00 0.0 3.25
2022-03-21 23:15:00 4451.50 4449.25 0.0 2.25
2022-03-21 23:30:00 4451.50 4448.50 0.0 3.00
2022-03-21 23:45:00 4449.75 4445.25 0.0 4.50
Any suggestion?
Upvotes: 1
Views: 197
Reputation: 13458
With the following toy dataframe:
import random
import pandas as pd
list_of_dates = pd.date_range(start="2021-1-1", end="2021-12-31", freq="T")
df = pd.DataFrame(
{
"datetime": list_of_dates,
"low": [random.randint(0, 4_999) for _ in range(len(list_of_dates))],
"high": [random.randint(5_000, 9_999) for _ in range(len(list_of_dates))],
}
)
df["HL_delta"] = df["high"] - df["low"]
print(df)
# Output
datetime low high HL_delta
0 2021-01-01 00:00:00 4325 5059 734
1 2021-01-01 00:01:00 917 7224 6307
2 2021-01-01 00:02:00 2956 7804 4848
3 2021-01-01 00:03:00 1329 8056 6727
4 2021-01-01 00:04:00 1721 9144 7423
...
Here is one way to do it:
# Setup
df["weekday"] = df["datetime"].dt.day_name()
df["datetime_minute"] = df["datetime"].dt.minute
intervals = {"0-15": [0, 15], "16-30": [16, 30], "31-45": [31, 45], "46-59": [46, 59]}
# Find intervals
df["interval"] = df.apply(
lambda x: next(
filter(
None,
[
key
if x["datetime_minute"] >= intervals[key][0]
and x["datetime_minute"] <= intervals[key][1]
else None
for key in intervals.keys()
],
)
),
axis=1,
)
# Get stats
new_df = (
df.drop(columns=["low", "high", "datetime", "datetime_minute"])
.groupby(["weekday", "interval"], sort=False)
.agg(["max", "mean"])
)
And so:
print(new_df)
# Output
HL_delta
max mean
weekday interval
Friday 0-15 9989 5011.461666
16-30 9948 5003.452724
31-45 9902 4969.810577
46-59 9926 5007.599073
Saturday 0-15 9950 5004.103966
16-30 9961 4984.479327
31-45 9954 5005.854647
46-59 9973 5011.797447
Sunday 0-15 9979 4994.012270
16-30 9950 4981.940438
31-45 9877 5009.572276
46-59 9930 5020.719609
Monday 0-15 9974 4963.538812
16-30 9918 4977.481090
31-45 9971 4977.858173
46-59 9958 4992.886733
Tuesday 0-15 9924 5014.045623
16-30 9966 4990.358547
31-45 9948 4993.595566
46-59 9948 5000.271120
Wednesday 0-15 9975 4998.320463
16-30 9976 4981.763889
31-45 9981 4981.806303
46-59 9995 5001.579670
Thursday 0-15 9958 5015.276643
16-30 9900 4996.797489
31-45 9949 4991.088034
46-59 9948 4980.678457
Upvotes: 1