Reputation: 23
I am working on a data frame df
with the following structure:
start_time end_time key vol
0 2018-08-23 00:00:00 2018-08-23 01:30:00 abcd_eg 0.92
1 2018-08-23 00:15:00 2018-08-23 01:45:00 defg_x2 0.27
I am trying to generate a 15 minute interval between a start and an end time. I would like to have to result as new rows in the same dataframe (or a new dataframe) as follows:
start_time end_time key vol
0 2018-08-23 00:00:00 2018-08-23 01:30:00 abcd_eg 0.92
1 2018-08-23 00:15:00 2018-08-23 01:30:00 abcd_eg 0.92
2 2018-08-23 00:30:00 2018-08-23 01:30:00 abcd_eg 0.92
3 2018-08-23 00:45:00 2018-08-23 01:30:00 abcd_eg 0.92
4 2018-08-23 01:00:00 2018-08-23 01:30:00 abcd_eg 0.92
5 2018-08-23 01:15:00 2018-08-23 01:30:00 abcd_eg 0.92
6 2018-08-23 01:30:00 2018-08-23 01:30:00 abcd_eg 0.92
7 2018-08-23 00:15:00 2018-08-23 01:45:00 defg_x2 0.27
8 2018-08-23 00:30:00 2018-08-23 01:45:00 defg_x2 0.27
9 2018-08-23 00:45:00 2018-08-23 01:45:00 defg_x2 0.27
10 2018-08-23 01:00:00 2018-08-23 01:45:00 defg_x2 0.27
11 2018-08-23 01:15:00 2018-08-23 01:45:00 defg_x2 0.27
12 2018-08-23 01:30:00 2018-08-23 01:45:00 defg_x2 0.27
13 2018-08-23 01:45:00 2018-08-23 01:45:00 defg_x2 0.27
The date columns are of type datetime[64]
, key is object
and vol is float
.
What I tried till now is:
b=[]
lst = []
for i, row in df.iterrows():
b = pd.date_range(start=row.start_time, end=row.end_time, freq='15min',closed=None)
lst.append(b)
Used .iterrows()
as I have about 125 records. This gives me the timeseries as DatetimeIndex
for all the values of start and the end time in the dataframe, with 15 minute intervals.
After this, I tried to pass lst
as a new column unpack
in the dataframe df
as follows:
df['unpack'] = lst
My thoughts were that if I could get these values as the new column in df, I could use this solution to extract them as rows. But this process is not working.
How can I do this with pandas?
Upvotes: 2
Views: 373
Reputation: 3677
My fifty cents:
First recreate example df:
df = pd.DataFrame({"start_time": [datetime(2018, 8, 23), datetime(2018, 8, 23, 0, 15)],
"end_time": [datetime(2018, 8, 23, 1, 30), datetime(2018, 8, 23, 1, 45)],
"key": ["abcd_eg", "defg_x2"],
"vol": [0.92, 0.27]})
Loop over start times, create a new dataframe for each start_time with the desired index and store them in a list.
dfs = []
for row in df.itertuples():
part_df = pd.DataFrame(index=pd.DatetimeIndex(start=row.start_time, end=row.end_time, freq='15T'),
data={'end_time': row.end_time, 'key': row.key, 'vol': row.vol})
part_df.index.name = 'start_time'
dfs.append(part_df)
Now concatenate all dataframes and reset the index:
result = pd.concat(dfs).reset_index()
giving the following result:
start_time end_time key vol
0 2018-08-23 00:00:00 2018-08-23 01:30:00 abcd_eg 0.92
1 2018-08-23 00:15:00 2018-08-23 01:30:00 abcd_eg 0.92
2 2018-08-23 00:30:00 2018-08-23 01:30:00 abcd_eg 0.92
3 2018-08-23 00:45:00 2018-08-23 01:30:00 abcd_eg 0.92
4 2018-08-23 01:00:00 2018-08-23 01:30:00 abcd_eg 0.92
5 2018-08-23 01:15:00 2018-08-23 01:30:00 abcd_eg 0.92
6 2018-08-23 01:30:00 2018-08-23 01:30:00 abcd_eg 0.92
7 2018-08-23 00:15:00 2018-08-23 01:45:00 defg_x2 0.27
8 2018-08-23 00:30:00 2018-08-23 01:45:00 defg_x2 0.27
9 2018-08-23 00:45:00 2018-08-23 01:45:00 defg_x2 0.27
10 2018-08-23 01:00:00 2018-08-23 01:45:00 defg_x2 0.27
11 2018-08-23 01:15:00 2018-08-23 01:45:00 defg_x2 0.27
12 2018-08-23 01:30:00 2018-08-23 01:45:00 defg_x2 0.27
13 2018-08-23 01:45:00 2018-08-23 01:45:00 defg_x2 0.27
Upvotes: 1
Reputation: 1486
Edit: I see that you had added new info. Maybe this is what you are looking for, if the number of 15-min intervals is fixed, then you can try this. Edit2: Now, it also works with non-fixed 15-min intervals.
import pandas as pd
gap = '15min'
date_start = ['2018-08-23 00:00:00','2018-08-23 00:15:00','2018-08-24 00:45:00', '2018-08-24 00:30:00']
date_end = ['2018-08-23 01:30:00','2018-08-23 01:45:00','2018-08-24 01:00:00','2018-08-24 02:45:00']
count = 0
to_repeat = []
data = {'start_time':date_start,'end_time':date_end,'key':['abcd_eg','defg_x2', 'whef_98','tuyr_23'],'vol':[0.92,0.27,0.87,0.90]}
df = pd.DataFrame(data)
for _ in zip(date_start, date_end):
temp = pd.date_range(_[0], _[1], freq=gap)
to_repeat.append(len(temp))
if count==0:
ind = temp
else:
ind = ind.append(temp)
count+=1
df_final = df.reindex(df.index.repeat(to_repeat))
df_final['start_time'] = ind
df_final.reset_index(inplace=True)
df_final.drop(columns='index',inplace=True)
print(df_final)
Output
start_time end_time key vol
0 2018-08-23 00:00:00 2018-08-23 01:30:00 abcd_eg 0.92
1 2018-08-23 00:15:00 2018-08-23 01:30:00 abcd_eg 0.92
2 2018-08-23 00:30:00 2018-08-23 01:30:00 abcd_eg 0.92
3 2018-08-23 00:45:00 2018-08-23 01:30:00 abcd_eg 0.92
4 2018-08-23 01:00:00 2018-08-23 01:30:00 abcd_eg 0.92
5 2018-08-23 01:15:00 2018-08-23 01:30:00 abcd_eg 0.92
6 2018-08-23 01:30:00 2018-08-23 01:30:00 abcd_eg 0.92
7 2018-08-23 00:15:00 2018-08-23 01:45:00 defg_x2 0.27
8 2018-08-23 00:30:00 2018-08-23 01:45:00 defg_x2 0.27
9 2018-08-23 00:45:00 2018-08-23 01:45:00 defg_x2 0.27
10 2018-08-23 01:00:00 2018-08-23 01:45:00 defg_x2 0.27
11 2018-08-23 01:15:00 2018-08-23 01:45:00 defg_x2 0.27
12 2018-08-23 01:30:00 2018-08-23 01:45:00 defg_x2 0.27
13 2018-08-23 01:45:00 2018-08-23 01:45:00 defg_x2 0.27
14 2018-08-24 00:45:00 2018-08-24 01:00:00 whef_98 0.87
15 2018-08-24 01:00:00 2018-08-24 01:00:00 whef_98 0.87
16 2018-08-24 00:30:00 2018-08-24 02:45:00 tuyr_23 0.90
17 2018-08-24 00:45:00 2018-08-24 02:45:00 tuyr_23 0.90
18 2018-08-24 01:00:00 2018-08-24 02:45:00 tuyr_23 0.90
19 2018-08-24 01:15:00 2018-08-24 02:45:00 tuyr_23 0.90
20 2018-08-24 01:30:00 2018-08-24 02:45:00 tuyr_23 0.90
21 2018-08-24 01:45:00 2018-08-24 02:45:00 tuyr_23 0.90
22 2018-08-24 02:00:00 2018-08-24 02:45:00 tuyr_23 0.90
23 2018-08-24 02:15:00 2018-08-24 02:45:00 tuyr_23 0.90
24 2018-08-24 02:30:00 2018-08-24 02:45:00 tuyr_23 0.90
25 2018-08-24 02:45:00 2018-08-24 02:45:00 tuyr_23 0.90
Upvotes: 0