Reputation: 395
I want to fill values between two number from two different column. I have a data frame that looks like this(df) I only want to fill the forward the Value column when I have a number in Start column and End column. Basically, the Start indicates that a process is getting started and end meaning that process is ending. So Value column only can have to be filled between these two.
Here is a code to generate the data and DF is what I want to get.
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
np.random.seed(11)
date_today = datetime.now()
ndays = 10
df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)],
'Start': pd.Series(np.random.randn(ndays)), 'End':pd.Series(np.random.randn(ndays))})
df = df.set_index('date')
df = df.mask(np.random.random(df.shape) < .6)
df.End[{0,1,2,5,6,9}]=np.nan
df.Start[5]=1
df.Start[{1,3,4,2,8, 9}]=np.nan
df['Value']=np.nan
df.Value[{0,5}]=[0.3,0.1]
df
I want to obtain a dataframe(DF) which look like this:
End Start Value
date
2018-06-18 22:34:35.964286 NaN 1.749455 0.3
2018-06-19 22:34:35.964286 NaN NaN 0.3
2018-06-20 22:34:35.964286 NaN NaN 0.3
2018-06-21 22:34:35.964286 0.561192 NaN 0.3
2018-06-22 22:34:35.964286 NaN NaN NaN
2018-06-23 22:34:35.964286 NaN 1.000000 0.1
2018-06-24 22:34:35.964286 NaN NaN 0.1
2018-06-25 22:34:35.964286 NaN NaN 0.1
2018-06-26 22:34:35.964286 -0.031075 NaN 0.1
2018-06-27 22:34:35.964286 NaN NaN NaN
Thanks in advance
Upvotes: 3
Views: 2875
Reputation: 1845
Try this: First you forward fill. Then calculate the number of 'events'. Then replace values with NaN if the number of 'events' is even.
df['Value'] = df['Value'].fillna(method='ffill')
temp = (df['End'].shift().notnull().astype(int) + df['Start'].notnull().astype(int)).cumsum()
df.loc[temp % 2 == 0, 'Value'] = np.nan
Edit: we have to use shift()
to modify 'End' because otherwise it will count the 'End' events on the same row that they occur. We want the 'End' events to be counted just after they occur so that 'Value' is not voided on the 'End' row.
Upvotes: 4