Behinoo
Behinoo

Reputation: 395

Pandas: Fill forward between based on a condition

I want to fill values between two number from two different column. I have a data frame that looks like this(df) I only want to fill the forward the Value column when I have a number in Start column and End column. Basically, the Start indicates that a process is getting started and end meaning that process is ending. So Value column only can have to be filled between these two.

Here is a code to generate the data and DF is what I want to get.

from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
np.random.seed(11) 
date_today = datetime.now()
ndays = 10
df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)], 
               'Start': pd.Series(np.random.randn(ndays)),     'End':pd.Series(np.random.randn(ndays))})
 df = df.set_index('date')
 df = df.mask(np.random.random(df.shape) < .6)
 df.End[{0,1,2,5,6,9}]=np.nan
 df.Start[5]=1
 df.Start[{1,3,4,2,8, 9}]=np.nan
 df['Value']=np.nan
 df.Value[{0,5}]=[0.3,0.1]
 df

I want to obtain a dataframe(DF) which look like this:

                                 End     Start  Value
date                                                 
2018-06-18 22:34:35.964286       NaN  1.749455    0.3
2018-06-19 22:34:35.964286       NaN       NaN    0.3
2018-06-20 22:34:35.964286       NaN       NaN    0.3
2018-06-21 22:34:35.964286  0.561192       NaN    0.3
2018-06-22 22:34:35.964286       NaN       NaN    NaN
2018-06-23 22:34:35.964286       NaN  1.000000    0.1
2018-06-24 22:34:35.964286       NaN       NaN    0.1
2018-06-25 22:34:35.964286       NaN       NaN    0.1
2018-06-26 22:34:35.964286 -0.031075       NaN    0.1
2018-06-27 22:34:35.964286       NaN       NaN    NaN

Thanks in advance

Upvotes: 3

Views: 2875

Answers (1)

Leo
Leo

Reputation: 1845

Try this: First you forward fill. Then calculate the number of 'events'. Then replace values with NaN if the number of 'events' is even.

df['Value'] = df['Value'].fillna(method='ffill')
temp = (df['End'].shift().notnull().astype(int) + df['Start'].notnull().astype(int)).cumsum()
df.loc[temp % 2 == 0, 'Value'] = np.nan

Edit: we have to use shift() to modify 'End' because otherwise it will count the 'End' events on the same row that they occur. We want the 'End' events to be counted just after they occur so that 'Value' is not voided on the 'End' row.

Upvotes: 4

Related Questions