Reputation: 23577
I have 2-time series data frames. Both contain values [0,1] only. The first one is called init_signal
and the second is called end_signal
. The idea is to create a new data frame when init_signal
has a 1
, it will find the NEXT 1 in end_signal
.
The example below merges both the init_signal and end_signal as one data frame. the first column is init_signal and the second column is end_signal.
For example:
2016-06-13 1 0
2016-06-14 0 0
2016-06-15 0 1
2016-06-16 0 0
Will become:
2016-06-13 1
2016-06-14 1
2016-06-15 1
2016-06-16 0
For example 2:
2016-06-13 1 1
2016-06-14 0 0
2016-06-15 0 1
2016-06-16 0 0
Will become:
2016-06-13 1
2016-06-14 1
2016-06-15 1
2016-06-16 0
In the 2nd example, if there is also a 1 in the same row as the init_signal, it will be ignored.
I've been using a complex loop. Is there a non-looping way?
Upvotes: 5
Views: 347
Reputation: 1208
This can achieved in three steps and 0 loops:
init_signal
and end_signal
.
(Records with both default to start signal only.) Note: I'm not sure how to handle cases at the beginning. I'm presuming that there will always be an init_signal
preceding an end_signal
. If that's not true, you'll need to build a handler for that, but that would be a simple conditional at the end of these steps.
In this step we create a column that has only days where there was a init_signal
and end_signal
.
import numpy as np
df['result'] = np.nan
#Add end dates
mask = (df['end_signal'] == 1)
df.loc[mask,'result'] = 0
#Add init_signals.
#Note: This will overwrite days that already have an end_signal
mask = (df['init_signal'] == 1)
df.loc[mask,'result'] = 1
This will take the na values and replace them with the status that occurred most recently.
df.fillna(method='ffill',inplace=True)
In definitions, we said that dates with an end_signal
should be 1, so we need to add that back to the dataframe.
mask = (df['end_signal'] == 1)
df.loc[mask,'result'] = 1
Upvotes: 0
Reputation: 140
I hope this helps and covers all the boundary conditions.
import pandas as pd
import datetime as dt
# Dummy data
df = pd.DataFrame()
df['init_signal'] = [0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0]
df['end_signal'] = [0,1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0]
df.index = [dt.date.today()+dt.timedelta(-i) for i in reversed(range(len(df)))]
# Cumsum for calculation
df['init_c'] = df['init_signal'].cumsum()
df['end_c'] = df['end_signal'].cumsum()
# result calculation
func = lambda x: (x==x.min()).shift().fillna(True)
df['result'] = df.groupby('init_c')['end_c'].transform(func)
df['result'] = df['result']&df['init_c'].astype(bool)
df['result'] = df['result'].astype(int)
# dropping temp columns
df.drop(['init_c','end_c'], axis=1, inplace=True)
print(df)
Output:
init_signal end_signal result
2018-12-12 0 0 0
2018-12-13 0 1 0
2018-12-14 0 0 0
2018-12-15 1 0 1
2018-12-16 0 0 1
2018-12-17 0 0 1
2018-12-18 0 1 1
2018-12-19 0 0 0
2018-12-20 1 1 1
2018-12-21 0 0 1
2018-12-22 0 0 1
2018-12-23 0 1 1
2018-12-24 0 0 0
2018-12-25 0 1 0
2018-12-26 0 0 0
2018-12-27 0 0 0
2018-12-28 1 0 1
2018-12-29 0 1 1
2018-12-30 0 0 0
Upvotes: 3
Reputation: 131
I don't think there is a non-looping way to incorporate the logic you want (and if there is, it isn't simpler than a loop). See below...
df['status'] = 0
for i in df.index:
if df.xs(i)['init_signal'] == 1:
df.at[i,'status'] = 1
elif i != 0 and df.xs(i-1)['status'] == 1 and df.xs(i)['end_signal'] != 1:
df.at[i,'status'] = 1
else:
df.at[i,'status'] = 0
This creates a 'status' column that will be zero until init_signal is turned 'on' and will remain 'on' until end_signal = 1 on a day where the init_signal is also not equal to 1 (per your example 2). Other non-looping solutions seem to run into trouble when multiple end/init signals are given in a row.
Upvotes: -1
Reputation: 153500
Let's try:
df1 = df.assign(end_signal = df['end_signal'].where(df['init_signal'].values != 1, 0))
df1 = df1.set_index('time')
(df1['init_signal'] + df1['end_signal'].shift().bfill()*-1).cumsum()
Output:
time 0
0 2016-06-13 1.0
1 2016-06-14 1.0
2 2016-06-15 1.0
3 2016-06-16 0.0
Upvotes: 0