Reputation: 1383
Initial DataFrame
index is_case value_a value_b
03/01/2005 True 0.598081665 0.189099313
04/01/2005 False 0.480809369 0.142255603
05/01/2005 False 0.963128886 0.422756089
06/01/2005 False 0.687675456 0.739599384
07/01/2005 True 0.513017431 0.397303797
08/01/2005 True 0.691884131 0.922642361
09/01/2005 False 0.659555415 0.222993436
10/01/2005 False 0.920539474 0.553573214
11/01/2005 False 0.360990121 0.535021421
12/01/2005 False 0.512528553 0.343931584
13/01/2005 False 0.083391071 0.277004714
14/01/2005 False 0.382696661 0.204780359
15/01/2005 False 0.838666246 0.337101306
16/01/2005 True 0.363920089 0.355211134
17/01/2005 False 0.354853214 0.691884131
18/01/2005 False 0.089324832 0.910276245
19/01/2005 False 0.611991454 0.513667459
20/01/2005 True 0.210785609 0.839849547
Desired Output:
The important thing about the output is to contain the column output
and the index
. Other columns may or may not be in the output.
index is_case value_a value_b output
03/01/2005 True 0.598081665 0.189099313 06/01/2005
07/01/2005 True 0.513017431 0.397303797 08/01/2005
08/01/2005 True 0.691884131 0.922642361 17/01/2005
16/01/2005 True 0.363920089 0.355211134 17/01/2005
20/01/2005 True 0.210785609 0.839849547 NaN
Transformation Logic
Get the value_a
of all rows where is_case
is True
and search from the next row onwards for a value_b
which is greater or equal than value_a
, it returns the first index
that meets this criteria in column output
.
Upvotes: 0
Views: 136
Reputation: 323326
Using numpy
with idxmax
mm=np.triu(-df['value_a'].values[:,None]+df['value_b'].values,1)
mm[np.tril_indices(mm.shape[0], 0)] = np.nan
temp=pd.DataFrame(mm)
df1=df.loc[df['is_case']]
df1['New']=np.nan
df1['New'].iloc[:4]=df['index'].iloc[(temp>=0).replace(False,np.nan).idxmax(1)[df['is_case']].dropna().astype(int).values].values
df1
Out[1098]:
index is_case value_a value_b New
0 03/01/2005 True 0.598082 0.189099 06/01/2005
4 07/01/2005 True 0.513017 0.397304 08/01/2005
5 08/01/2005 True 0.691884 0.922642 17/01/2005
13 16/01/2005 True 0.363920 0.355211 17/01/2005
17 20/01/2005 True 0.210786 0.839850 NaN
Upvotes: 2
Reputation: 153500
IIUC, and if your dataframes aren't too big, you can use a cartesian join and filters, then drop duplicates to get the first value matches like this:
df_is_case = df[df['is_case'] == True]
df_joined = df_is_case.assign(key=1)\
.merge(df.assign(key=1),
on='key',
suffixes=('','_y'))\
.query('index < index_y and value_a <= value_b_y')
df_out = pd.concat([df_joined, df_is_case])\
.drop_duplicates(subset='index')[['index', 'is_case', 'value_a', 'value_b', 'index_y']]\
.rename(columns={'index_y':'output'})
print(df_out)
Output:
index is_case value_a value_b output
3 03/01/2005 True 0.598082 0.189099 06/01/2005
23 07/01/2005 True 0.513017 0.397304 08/01/2005
50 08/01/2005 True 0.691884 0.922642 17/01/2005
68 16/01/2005 True 0.363920 0.355211 17/01/2005
17 20/01/2005 True 0.210786 0.839850 NaN
Upvotes: 3
Reputation: 68
with df as your data frame
df_iscase = df[df.is_case]
def transform(se):
remaining = df.loc[se.name:]
if len(remaining) < 2: # check if not last value
return np.nan
remaining = remaining.iloc[1:] # grab all next value (removing self)
remaining = remaining[remaining.value_b >= se.value_a] # all where b >= a
if len(remaining) < 1: # if not any
return np.nan
return remaining.iloc[0].name # return first value where b >= a
df_iscase.apply(transform, axis=1)
Upvotes: 1