Reputation: 23577

Setting values given two column

I have 2-time series data frames. Both contain values [0,1] only. The first one is called init_signal and the second is called end_signal. The idea is to create a new data frame when init_signal has a 1, it will find the NEXT 1 in end_signal.

The example below merges both the init_signal and end_signal as one data frame. the first column is init_signal and the second column is end_signal.

For example:

2016-06-13  1  0
2016-06-14  0  0
2016-06-15  0  1
2016-06-16  0  0

Will become:

2016-06-13  1  
2016-06-14  1  
2016-06-15  1  
2016-06-16  0

For example 2:

2016-06-13  1  1
2016-06-14  0  0
2016-06-15  0  1
2016-06-16  0  0

Will become:

2016-06-13  1  
2016-06-14  1  
2016-06-15  1  
2016-06-16  0

In the 2nd example, if there is also a 1 in the same row as the init_signal, it will be ignored.

I've been using a complex loop. Is there a non-looping way?

Upvotes: 5

Answers (4)

Polkaguy6000

Reputation: 1208

This can achieved in three steps and 0 loops:

Create a single column with the init_signal and end_signal. (Records with both default to start signal only.)
Fill the dates with no activity.
Add the end signal back as active.

Note: I'm not sure how to handle cases at the beginning. I'm presuming that there will always be an init_signal preceding an end_signal. If that's not true, you'll need to build a handler for that, but that would be a simple conditional at the end of these steps.

Step 1: A single columns

In this step we create a column that has only days where there was a init_signal and end_signal.

 import numpy as np
 df['result'] = np.nan
 #Add end dates
 mask = (df['end_signal'] == 1)
 df.loc[mask,'result'] = 0
 #Add init_signals.
 #Note: This will overwrite days that already have an end_signal
 mask = (df['init_signal'] == 1)
 df.loc[mask,'result'] = 1

Step 2: Fill the columns

This will take the na values and replace them with the status that occurred most recently.

 df.fillna(method='ffill',inplace=True)

Step 3: Set end_signal to active.

In definitions, we said that dates with an end_signal should be 1, so we need to add that back to the dataframe.

 mask = (df['end_signal'] == 1)
 df.loc[mask,'result'] = 1

Upvotes: 0

Nitin Sharma

Reputation: 140

I hope this helps and covers all the boundary conditions.

import pandas as pd
import datetime as dt

# Dummy data
df = pd.DataFrame()
df['init_signal'] = [0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0]
df['end_signal']  = [0,1,0,0,0,0,1,0,1,0,0,1,0,1,0,0,0,1,0]
df.index = [dt.date.today()+dt.timedelta(-i) for i in reversed(range(len(df)))]

# Cumsum for calculation
df['init_c'] = df['init_signal'].cumsum()
df['end_c']  = df['end_signal'].cumsum()

# result calculation
func = lambda x: (x==x.min()).shift().fillna(True)
df['result'] = df.groupby('init_c')['end_c'].transform(func)
df['result'] = df['result']&df['init_c'].astype(bool)
df['result'] = df['result'].astype(int)

# dropping temp columns
df.drop(['init_c','end_c'], axis=1, inplace=True)

print(df)

Output:

            init_signal  end_signal  result
2018-12-12            0           0       0
2018-12-13            0           1       0
2018-12-14            0           0       0
2018-12-15            1           0       1
2018-12-16            0           0       1
2018-12-17            0           0       1
2018-12-18            0           1       1
2018-12-19            0           0       0
2018-12-20            1           1       1
2018-12-21            0           0       1
2018-12-22            0           0       1
2018-12-23            0           1       1
2018-12-24            0           0       0
2018-12-25            0           1       0
2018-12-26            0           0       0
2018-12-27            0           0       0
2018-12-28            1           0       1
2018-12-29            0           1       1
2018-12-30            0           0       0

Upvotes: 3

psychopg2

Reputation: 131

I don't think there is a non-looping way to incorporate the logic you want (and if there is, it isn't simpler than a loop). See below...

df['status'] = 0
for i in df.index:
    if df.xs(i)['init_signal'] == 1:
        df.at[i,'status'] = 1
    elif i != 0 and df.xs(i-1)['status'] == 1 and df.xs(i)['end_signal'] != 1:
        df.at[i,'status'] = 1
    else:
        df.at[i,'status'] = 0

This creates a 'status' column that will be zero until init_signal is turned 'on' and will remain 'on' until end_signal = 1 on a day where the init_signal is also not equal to 1 (per your example 2). Other non-looping solutions seem to run into trouble when multiple end/init signals are given in a row.

Upvotes: -1

Scott Boston

Reputation: 153500

Let's try:

df1 = df.assign(end_signal = df['end_signal'].where(df['init_signal'].values != 1, 0))
df1 = df1.set_index('time')
(df1['init_signal'] + df1['end_signal'].shift().bfill()*-1).cumsum()