ast
ast

Reputation: 95

Pandas Dataframe - Convert time interval to continuous time series

I have an energy data with a start time and end time interval. I want to distribute the total energy to the continuous time series (dividing total energy to total hours). As I searched the results, I have found staircase and daterange functions. However, with these results I couldn't distribute energy and sum same intervals. Sample df;

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

I want to distribute each row and sum same intervals with 15Min resolution. Note that I need TotalEnergy/TotalTime.

My trials;

Staircase Code:

import pandas as pd
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
import staircase as sc
stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
result = stepfunction(times, include_index=True)
result = pd.DataFrame(result)

Result 1:

2019-01-01 00:00:00  10
2019-01-01 00:15:00  10
2019-01-01 00:30:00  10
2019-01-01 00:45:00  10
2019-01-01 01:00:00  15
2019-01-01 01:15:00  15
2019-01-01 01:30:00  15
2019-01-01 01:45:00  15
2019-01-01 02:00:00  12
2019-01-01 02:15:00  12
2019-01-01 02:30:00  12
2019-01-01 02:45:00  12
2019-01-01 03:00:00   2
2019-01-01 03:15:00   2
2019-01-01 03:30:00   2
2019-01-01 03:45:00   2
2019-01-01 04:00:00   0

New code according to Riley, the author of "Staircase"

import pandas as pd

df = pd.DataFrame({
    "Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
    "Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
    "TotalEnergy":[10, 5, 2],
})

df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"]),

import staircase as sc

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
bins = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
sf.slice(pd.IntervalIndex.from_breaks(times)).mean()

Output:

(2019-01-01, 2019-01-01 00:15:00]             10.0
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    10.0
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    10.0
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    10.0
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    15.0
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    15.0
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    15.0
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    15.0
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    12.0
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    12.0
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    12.0
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    12.0
(2019-01-01 03:00:00, 2019-01-01 03:15:00]     2.0
(2019-01-01 03:15:00, 2019-01-01 03:30:00]     2.0
(2019-01-01 03:30:00, 2019-01-01 03:45:00]     2.0
(2019-01-01 03:45:00, 2019-01-01 04:00:00]     2.0

However, this is not what I want. First energy is 10kWh, time interval is 3 hours, that means 12*15 minutes. So each bin should contain 10kWh/12bins = 0.83kWh. The stepfunction should look like maybe;

stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy"/("Stop"-"Start")

The desired output (calculated by hand):

(2019-01-01, 2019-01-01 00:15:00]             0.83
(2019-01-01 00:15:00, 2019-01-01 00:30:00]    0.83
(2019-01-01 00:30:00, 2019-01-01 00:45:00]    0.83
(2019-01-01 00:45:00, 2019-01-01 01:00:00]    0.83
(2019-01-01 01:00:00, 2019-01-01 01:15:00]    2.08
(2019-01-01 01:15:00, 2019-01-01 01:30:00]    2.08
(2019-01-01 01:30:00, 2019-01-01 01:45:00]    2.08
(2019-01-01 01:45:00, 2019-01-01 02:00:00]    2.08
(2019-01-01 02:00:00, 2019-01-01 02:15:00]    1.08
(2019-01-01 02:15:00, 2019-01-01 02:30:00]    1.08
(2019-01-01 02:30:00, 2019-01-01 02:45:00]    1.08
(2019-01-01 02:45:00, 2019-01-01 03:00:00]    1.08
(2019-01-01 03:00:00, 2019-01-01 03:15:00]    0.25
(2019-01-01 03:15:00, 2019-01-01 03:30:00]    0.25
(2019-01-01 03:30:00, 2019-01-01 03:45:00]    0.25
(2019-01-01 03:45:00, 2019-01-01 04:00:00]    0.25

Upvotes: 1

Views: 833

Answers (2)

BeRT2me
BeRT2me

Reputation: 13242

                Start                Stop  Connected  Charged  TotalEnergy  MaxPower
0 2019-08-27 14:52:00 2019-08-27 17:58:19       3.11     3.10         9.86     3.342
1 2019-03-01 10:14:05 2019-03-01 13:13:54       3.00     3.00         9.38     3.440
2 2019-07-31 12:54:10 2019-07-31 13:21:45       0.46     0.46         1.45     3.160
3 2019-06-16 10:55:57 2019-06-16 14:35:14       3.65     3.65        38.77    10.813
4 2019-03-07 21:21:04 2019-03-08 07:22:59      10.03     7.15        26.14     3.884

Doing:

df['Intervals'] = df.apply(lambda x: pd.date_range(x.Start, x.Stop, freq='15min'), axis=1)
df = df.explode('Intervals').reset_index(drop=True)
df['Values'] = df.groupby(['Start', 'Stop'])['TotalEnergy'].transform(lambda x: x.mean() / len(x))
print(df[['Intervals', 'Values']])

Output:

             Intervals    Values
0  2019-08-27 14:52:00  0.758462
1  2019-08-27 15:07:00  0.758462
2  2019-08-27 15:22:00  0.758462
3  2019-08-27 15:37:00  0.758462
4  2019-08-27 15:52:00  0.758462
..                 ...       ...
78 2019-03-08 06:21:04  0.637561
79 2019-03-08 06:36:04  0.637561
80 2019-03-08 06:51:04  0.637561
81 2019-03-08 07:06:04  0.637561
82 2019-03-08 07:21:04  0.637561

[83 rows x 2 columns]

Upvotes: 0

Riley
Riley

Reputation: 2261

I'm not 100% clear on what is needed, but if it is to essentially take the average (i.e. "distribute" as you say) the energy within 15 minute intervals then the below approach with staircase can be used.

setup

import pandas as pd

df = pd.DataFrame({
    "Start":["2019-08-27 14:52:00", "2019-03-01 10:14:05", "2019-07-31 12:54:10", "2019-06-16 10:55:57", "2019-03-07 21:21:04"],
    "Stop":["2019-08-27 17:58:19", "2019-03-01 13:13:54", "2019-07-31 13:21:45", "2019-06-16 14:35:14", "2019-03-08 07:22:59"],
    "TotalEnergy":[9.86, 9.38, 1.45, 38.77, 26.14],
})

df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"])

df looks like this

                Start                Stop  TotalEnergy
0 2019-08-27 14:52:00 2019-08-27 17:58:19         9.86
1 2019-03-01 10:14:05 2019-03-01 13:13:54         9.38
2 2019-07-31 12:54:10 2019-07-31 13:21:45         1.45
3 2019-06-16 10:55:57 2019-06-16 14:35:14        38.77
4 2019-03-07 21:21:04 2019-03-08 07:22:59        26.14

solution

create step function

import staircase as sc

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")

create 15 minute bins (a pandas.IntervalIndex)

times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)

slice the step function into bins and take the mean to get a pandas.Series indexed by the bins

sf.slice(bins).mean()

The result looks like this

(2019-03-01 10:14:05, 2019-03-01 10:29:05]    9.38
(2019-03-01 10:29:05, 2019-03-01 10:44:05]    9.38
(2019-03-01 10:44:05, 2019-03-01 10:59:05]    9.38
(2019-03-01 10:59:05, 2019-03-01 11:14:05]    9.38
(2019-03-01 11:14:05, 2019-03-01 11:29:05]    9.38
                                            ... 
(2019-08-27 16:29:05, 2019-08-27 16:44:05]    9.86
(2019-08-27 16:44:05, 2019-08-27 16:59:05]    9.86
(2019-08-27 16:59:05, 2019-08-27 17:14:05]    9.86
(2019-08-27 17:14:05, 2019-08-27 17:29:05]    9.86
(2019-08-27 17:29:05, 2019-08-27 17:44:05]    9.86
Length: 17214, dtype: float64

Note this will take a bit of computation time. Averaging across 17214 is not a vectorised calculation unfortunately.

updated solution

The solution above doesn't solve the problem @ast was intending. The solution below will.

After setup of dataframe, and conversion to timestamp etc create a column in your dataframe which represents the energy per hour rate

interval_hrs = (df["Stop"] - df["Start"])/pd.Timedelta("1hr")
df["TotalEnergyRate"] = df["TotalEnergy"]/interval_hrs

This time we'll use the value of TotalEnergyRate to create the step function

sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergyRate")

What you have is a step function, whose value at any time will be the energy rate. If you want to know how much energy is consumed in a particular time period then the result is an integration. We'll slice as before, but this time call integral. This gives you a Series of pandas.Timedelta which you can then divide through by a 1 hour timedelta to convert back from Timedeltas to achieve the answer you are after.

times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)
sf.slice(bins).integral()/pd.Timedelta("1hr")

Disclaimer (required by stackoverflow): I am the creator of the staircase package.

Upvotes: 1

Related Questions