Reputation: 7723
I had a question earlier which is deleted and now modified to a less verbose form for you to read easily.
I have a dataframe as given below
df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})
df['fake_flag'] = ''
I would like to fill values in column fake_flag
based on the below rules
1) if preceding two rows are constant (ex:5,5) or decreasing (7,5), then pick the highest of the two rows. In this case, it is 7 from (7,5) and 5 from (5,5)
2) Check whether the current row is greater than the output from rule 1 by 3 or more points (>=3) and it repeats in another (next) row (2 occurrences of same value). It can be 8/gt 8(if rule 1 output is 5). ex: (8 in row n
,8 in row n+1
or 10 in row n
,10 in row n+1
) If yes, then key in fake VAC
in the fake_flag column
This is what I tried
for i in t1.index:
if i >=2:
print("current value is ", t1[i])
print("preceding 1st (n-1) ", t1[i-1])
print("preceding 2nd (n-2) ", t1[i-2])
if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]): # rule 1 check
r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
print("rule 1 output is ", r1_output)
if t1[i] >= r1_output + 3:
print("found a value for rule 2", t1[i])
print("check for next value is same as current value", t1[i+1])
if (t1[i]==t1[i+1]): # rule 2 check
print("fake flag is being set")
df['fake_flag'][i] = 'fake_vac'
This check should happen for all records (one by one) for each subject_id. I have a dataset which has million records. Any efficient and elegant solution is helpful. I can't run a loop over million records.
I expect my output to be like as shown below
subject_id = 1
subject_id = 2
Upvotes: 0
Views: 83
Reputation: 4638
import pandas as pd
df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})
df['shift1']=df['PEEP'].shift(1)
df['shift2']=df['PEEP'].shift(2)
df['fake_flag'] = np.where((df['shift1'] ==df['shift2']) | (df['shift1'] < df['shift2']), 'fake VAC', '')
df.drop(['shift1','shift2'],axis=1)
Output
0 1 1 7
1 1 2 5
2 1 3 10 fake VAC
3 1 4 10
4 1 5 11 fake VAC
5 1 6 11
6 1 7 14 fake VAC
7 1 8 14
8 1 9 17 fake VAC
9 1 10 17
10 1 11 21 fake VAC
11 1 12 21
12 1 13 23 fake VAC
13 1 14 23
14 1 15 25 fake VAC
15 1 16 25
16 1 17 22 fake VAC
17 1 18 20 fake VAC
18 1 19 26 fake VAC
19 1 20 26
20 2 1 5 fake VAC
21 2 2 7 fake VAC
22 2 3 8
23 2 4 8
24 2 5 9 fake VAC
25 2 6 9
26 2 7 13 fake VAC
Upvotes: 3