joaoavf
joaoavf

Reputation: 1383

Returning a pandas DataFrame with the indexes of first next cases of >= value

Initial DataFrame

index        is_case        value_a         value_b
03/01/2005      True    0.598081665     0.189099313
04/01/2005     False    0.480809369     0.142255603
05/01/2005     False    0.963128886     0.422756089
06/01/2005     False    0.687675456     0.739599384
07/01/2005      True    0.513017431     0.397303797
08/01/2005      True    0.691884131     0.922642361
09/01/2005     False    0.659555415     0.222993436
10/01/2005     False    0.920539474     0.553573214
11/01/2005     False    0.360990121     0.535021421
12/01/2005     False    0.512528553     0.343931584
13/01/2005     False    0.083391071     0.277004714
14/01/2005     False    0.382696661     0.204780359
15/01/2005     False    0.838666246     0.337101306
16/01/2005      True    0.363920089     0.355211134
17/01/2005     False    0.354853214     0.691884131
18/01/2005     False    0.089324832     0.910276245
19/01/2005     False    0.611991454     0.513667459
20/01/2005      True    0.210785609     0.839849547

Desired Output:

The important thing about the output is to contain the column output and the index. Other columns may or may not be in the output.

index        is_case        value_a         value_b          output
03/01/2005      True    0.598081665     0.189099313      06/01/2005
07/01/2005      True    0.513017431     0.397303797      08/01/2005
08/01/2005      True    0.691884131     0.922642361      17/01/2005
16/01/2005      True    0.363920089     0.355211134      17/01/2005
20/01/2005      True    0.210785609     0.839849547             NaN

Transformation Logic

Get the value_a of all rows where is_case is True and search from the next row onwards for a value_b which is greater or equal than value_a, it returns the first index that meets this criteria in column output.

Upvotes: 0

Views: 136

Answers (3)

BENY
BENY

Reputation: 323326

Using numpy with idxmax

mm=np.triu(-df['value_a'].values[:,None]+df['value_b'].values,1)
mm[np.tril_indices(mm.shape[0], 0)] = np.nan
temp=pd.DataFrame(mm)

df1=df.loc[df['is_case']]
df1['New']=np.nan
df1['New'].iloc[:4]=df['index'].iloc[(temp>=0).replace(False,np.nan).idxmax(1)[df['is_case']].dropna().astype(int).values].values


df1
Out[1098]: 
         index  is_case   value_a   value_b         New
0   03/01/2005     True  0.598082  0.189099  06/01/2005
4   07/01/2005     True  0.513017  0.397304  08/01/2005
5   08/01/2005     True  0.691884  0.922642  17/01/2005
13  16/01/2005     True  0.363920  0.355211  17/01/2005
17  20/01/2005     True  0.210786  0.839850         NaN

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153500

IIUC, and if your dataframes aren't too big, you can use a cartesian join and filters, then drop duplicates to get the first value matches like this:

df_is_case = df[df['is_case'] == True]
df_joined = df_is_case.assign(key=1)\
                         .merge(df.assign(key=1), 
                                on='key', 
                                suffixes=('','_y'))\
                         .query('index < index_y and value_a <= value_b_y')

df_out = pd.concat([df_joined, df_is_case])\
           .drop_duplicates(subset='index')[['index', 'is_case', 'value_a', 'value_b', 'index_y']]\
           .rename(columns={'index_y':'output'})

print(df_out)

Output:

         index  is_case   value_a   value_b      output
3   03/01/2005     True  0.598082  0.189099  06/01/2005
23  07/01/2005     True  0.513017  0.397304  08/01/2005
50  08/01/2005     True  0.691884  0.922642  17/01/2005
68  16/01/2005     True  0.363920  0.355211  17/01/2005
17  20/01/2005     True  0.210786  0.839850         NaN

Upvotes: 3

AlexFC
AlexFC

Reputation: 68

with df as your data frame

df_iscase = df[df.is_case]

def transform(se):
    remaining = df.loc[se.name:]
    if len(remaining) < 2: # check if not last value
        return np.nan
    remaining = remaining.iloc[1:] # grab all next value (removing self)
    remaining = remaining[remaining.value_b >= se.value_a] # all where b >= a
    if len(remaining) < 1: # if not any
        return np.nan
    return remaining.iloc[0].name # return first value where b >= a

df_iscase.apply(transform, axis=1)

Upvotes: 1

Related Questions