Danish
Danish

Reputation: 2871

divide a column based on groupby or looping conditions in pandas

I have a data frame as shown below

   B_ID   No_Show   Session  slot_num   Patient_count
    1     0.2       S1        1          1
    2     0.3       S1        2          1
    3     0.8       S1        3          1
    4     0.3       S1        3          2
    5     0.6       S1        4          1
    6     0.8       S1        5          1
    7     0.9       S1        5          2
    8     0.4       S1        5          3
    9     0.6       S1        5          4
    12    0.9       S2        1          1
    13    0.5       S2        1          2
    14    0.3       S2        2          1
    15    0.7       S2        3          1
    20    0.7       S2        4          1
    16    0.6       S2        5          1
    17    0.8       S2        5          2
    19    0.3       S2        5          3

where

No_Show = Probability of no show

Assume that

threshold probability = 0.2 Duration for each slot = 30 (minutes)

From the above I would like calculate below data frame

Step1

sort the dataframe based on Session, slot_number and Patient_count

df = df.sort_values(['Session', 'slot_num', 'Patient_count'], ascending=False)

step 2 Calculate the cut off by using below conditions

if patient_count = 1 Divide No_show by threshold probability if patient_count = 1

Example for B_ID = 3, Patient_count = 1, cut_off = 0.8/0.2 = 4

else if patient_count = 2 multiply previously 1 No_Show with current No_show and divide with threshold)

Example for B_ID = 4, Patient_count = 2, cut_off = (0.3*0.8)/0.2 = 1.2

else if patient_count = 3 multiply previously 2 No_Show with current No_show and divide with threshold

Example for B_ID = 8, Patient_count = 3, cut_off = (0.4*0.9*0.8)/0.2 = 1.44

And so on

The Expected Output:

      B_ID   No_Show   Session  slot_num   Patient_count  Cut_off
        1     0.2       S1        1          1             1
        2     0.3       S1        2          1             1.5
        3     0.8       S1        3          1             4
        4     0.3       S1        3          2             1.2
        5     0.6       S1        4          1             3
        6     0.8       S1        5          1             4
        7     0.9       S1        5          2             3.6
        8     0.4       S1        5          3             1.44
        9     0.6       S1        5          4             0.864
        12    0.9       S2        1          1             4.5
        13    0.5       S2        1          2             2.25
        14    0.3       S2        2          1             1.5
        15    0.7       S2        3          1             3.5
        20    0.7       S2        4          1             3.5
        16    0.6       S2        5          1             3
        17    0.8       S2        5          2             2.4
        19    0.3       S2        5          3             0.72

Upvotes: 2

Views: 47

Answers (1)

jezrael
jezrael

Reputation: 862921

Use GroupBy.cumprod and divide by probability by Series.div:

probability = 0.2
df['new'] = df.groupby(['Session','slot_num'])['No_Show'].cumprod().div(probability)
print (df)
    B_ID  No_Show Session  slot_num  Patient_count    new
0      1      0.2      S1         1              1  1.000
1      2      0.3      S1         2              1  1.500
2      3      0.8      S1         3              1  4.000
3      4      0.3      S1         3              2  1.200
4      5      0.6      S1         4              1  3.000
5      6      0.8      S1         5              1  4.000
6      7      0.9      S1         5              2  3.600
7      8      0.4      S1         5              3  1.440
8      9      0.6      S1         5              4  0.864
9     12      0.9      S2         1              1  4.500
10    13      0.5      S2         1              2  2.250
11    14      0.3      S2         2              1  1.500
12    15      0.7      S2         3              1  3.500
13    20      0.7      S2         4              1  3.500
14    16      0.6      S2         5              1  3.000
15    17      0.8      S2         5              2  2.400
16    19      0.3      S2         5              3  0.720

Upvotes: 3

Related Questions