Pandas Dataframe - Convert time interval to continuous time series

Question

I have an energy data with a start time and end time interval. I want to distribute the total energy to the continuous time series (dividing total energy to total hours). As I searched the results, I have found staircase and daterange functions. However, with these results I couldn't distribute energy and sum same intervals. Sample df;

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

I want to distribute each row and sum same intervals with 15Min resolution. Note that I need TotalEnergy/TotalTime.

My trials;

Staircase Code:

import pandas as pd
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
import staircase as sc
stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
result = stepfunction(times, include_index=True)
result = pd.DataFrame(result)

Result 1:

2019-01-01 00:00:00  10
2019-01-01 00:15:00  10
2019-01-01 00:30:00  10
2019-01-01 00:45:00  10
2019-01-01 01:00:00  15
2019-01-01 01:15:00  15
2019-01-01 01:30:00  15
2019-01-01 01:45:00  15
2019-01-01 02:00:00  12
2019-01-01 02:15:00  12
2019-01-01 02:30:00  12
2019-01-01 02:45:00  12
2019-01-01 03:00:00   2
2019-01-01 03:15:00   2
2019-01-01 03:30:00   2
2019-01-01 03:45:00   2
2019-01-01 04:00:00   0

New code according to Riley, the author of "Staircase"

import pandas as pd

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"]),

import staircase as sc

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
bins = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
sf.slice(pd.IntervalIndex.from_breaks(times)).mean()

Output:

(2019-01-01, 2019-01-01 00:15:00]             10.0
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    10.0
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    10.0
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    10.0
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    15.0
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    15.0
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    15.0
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    15.0
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    12.0
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    12.0
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    12.0
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    12.0
(2019-01-01 03:00:00, 2019-01-01 03:15:00]     2.0
(2019-01-01 03:15:00, 2019-01-01 03:30:00]     2.0
(2019-01-01 03:30:00, 2019-01-01 03:45:00]     2.0
(2019-01-01 03:45:00, 2019-01-01 04:00:00]     2.0

However, this is not what I want. First energy is 10kWh, time interval is 3 hours, that means 12*15 minutes. So each bin should contain 10kWh/12bins = 0.83kWh. The stepfunction should look like maybe;

stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy"/("Stop"-"Start")

The desired output (calculated by hand):

(2019-01-01, 2019-01-01 00:15:00]             0.83
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    0.83
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    0.83
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    0.83
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    2.08
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    2.08
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    2.08
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    2.08
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    1.08
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    1.08
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    1.08
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    1.08
(2019-01-01 03:00:00, 2019-01-01 03:15:00]    0.25
(2019-01-01 03:15:00, 2019-01-01 03:30:00]    0.25
(2019-01-01 03:30:00, 2019-01-01 03:45:00]    0.25
(2019-01-01 03:45:00, 2019-01-01 04:00:00]    0.25

Riley · Accepted Answer

I'm not 100% clear on what is needed, but if it is to essentially take the average (i.e. "distribute" as you say) the energy within 15 minute intervals then the below approach with staircase can be used.

setup

import pandas as pd

df = pd.DataFrame({
    "Start":["2019-08-27 14:52:00", "2019-03-01 10:14:05", "2019-07-31 12:54:10", "2019-06-16 10:55:57", "2019-03-07 21:21:04"],
    "Stop":["2019-08-27 17:58:19", "2019-03-01 13:13:54", "2019-07-31 13:21:45", "2019-06-16 14:35:14", "2019-03-08 07:22:59"],
    "TotalEnergy":[9.86, 9.38, 1.45, 38.77, 26.14],
})

df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"])

df looks like this

                Start                Stop  TotalEnergy
0 2019-08-27 14:52:00 2019-08-27 17:58:19         9.86
1 2019-03-01 10:14:05 2019-03-01 13:13:54         9.38
2 2019-07-31 12:54:10 2019-07-31 13:21:45         1.45
3 2019-06-16 10:55:57 2019-06-16 14:35:14        38.77
4 2019-03-07 21:21:04 2019-03-08 07:22:59        26.14

solution

create step function

import staircase as sc

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")

create 15 minute bins (a pandas.IntervalIndex)

times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)

slice the step function into bins and take the mean to get a pandas.Series indexed by the bins

sf.slice(bins).mean()

The result looks like this

(2019-03-01 10:14:05, 2019-03-01 10:29:05]    9.38
(2019-03-01 10:29:05, 2019-03-01 10:44:05]    9.38
(2019-03-01 10:44:05, 2019-03-01 10:59:05]    9.38
(2019-03-01 10:59:05, 2019-03-01 11:14:05]    9.38
(2019-03-01 11:14:05, 2019-03-01 11:29:05]    9.38
                                            ... 
(2019-08-27 16:29:05, 2019-08-27 16:44:05]    9.86
(2019-08-27 16:44:05, 2019-08-27 16:59:05]    9.86
(2019-08-27 16:59:05, 2019-08-27 17:14:05]    9.86
(2019-08-27 17:14:05, 2019-08-27 17:29:05]    9.86
(2019-08-27 17:29:05, 2019-08-27 17:44:05]    9.86
Length: 17214, dtype: float64

Note this will take a bit of computation time. Averaging across 17214 is not a vectorised calculation unfortunately.

updated solution

The solution above doesn't solve the problem @ast was intending. The solution below will.

After setup of dataframe, and conversion to timestamp etc create a column in your dataframe which represents the energy per hour rate

interval_hrs = (df["Stop"] - df["Start"])/pd.Timedelta("1hr")
df["TotalEnergyRate"] = df["TotalEnergy"]/interval_hrs

This time we'll use the value of TotalEnergyRate to create the step function

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergyRate")

What you have is a step function, whose value at any time will be the energy rate. If you want to know how much energy is consumed in a particular time period then the result is an integration. We'll slice as before, but this time call integral. This gives you a Series of pandas.Timedelta which you can then divide through by a 1 hour timedelta to convert back from Timedeltas to achieve the answer you are after.

times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)
sf.slice(bins).integral()/pd.Timedelta("1hr")

Disclaimer (required by stackoverflow): I am the creator of the staircase package.

Pandas Dataframe - Convert time interval to continuous time series

Answers (2)

Related Questions