How to check between which threshold level does a value lie in?

Question

I have a dataframe having columns looking like this (having 1400 unique contextIDs and 28 different IndicatorIDs):

ContextID   IndicatorID threshold_values    AlarmLevel  actual_values
7289972        204511   -6.10904                 -1         0
7289972        204511   -12.1848                 -2         0
7289972        204511   -18.2606                 -3         0
7289972        204511   18.19404                  1         0
7289972        204511   24.2698                   2         0
7289972        204511   30.34557                  3         0
7289972        204512   89.94568                  1        64.114
7289972        204512   104.2932                  2        64.114
7289972        204512   118.6407                  3        64.114
7289972        204512   32.55574                 -1        64.114
7289972        204512   18.20825                 -2        64.114
7289972        204512   3.860765                 -3        64.114
7289998        204511   -6.10904                 -1           1
7289998        204511   -12.1848                 -2           1
7289998        204511   -18.2606                 -3           1
7289998        204511   18.19404                  1           1
7289998        204511   24.2698                   2           1
7289998        204511   30.34557                  3           1
7289998        204512   89.94568                  1        64.111
7289998        204512   104.2932                  2        64.111
7289998        204512   118.6407                  3        64.111
7289998        204512   32.55574                 -1        64.111
7289998        204512   18.20825                 -2        64.111
7289998        204512   3.860765                 -3        64.111

The actual_values column is the real value read by a sensor of a machine. The threshold_values columns contain various thresholds defined for various indicators (in the IndicatorID column), depending on which an alarm will be raised if the value crosses a certain limit.

Example: If a value in actual_values lies between the threshold_values defined for alarm level -1 and +1 the product is not defective. But, if the value lies between -1 and -2, an alarm of -1 (since it has crossed the threshold that was defined for -1) must be raised and if the value lies between +1 and +2, an alarm of +1 must be raised, and so on. In the end, the biggest alarm level must be assigned to the ContextID, meaning, if one indicator has raised an alarm of +1 and a second indicator has raised an alarm of -2, the alarm level of -2 must be considered greater and assigned as the final alarm to that ContextID (preferably in a new column).

I wanted some help in implementing this concept. I would like to know if such an implementation can be coded.

I am trying to implement it using 2 different for loops, one for all the ContextIDs and other for the IndicatorIDs, but somehow I am failing in coming up with the logic that can achieve this task.

I would be grateful for help and guidance.

Thanks

Edit 1:

Example:

ContextID   IndicatorID threshold_values    AlarmLevel  actual_values   thresh_high alarm_high  insideThresh
7291899 204515  0.708226    -3  0.949486    0.742542    -2  FALSE
7291899 204515  0.742542    -2  0.949486    0.76        -1  FALSE
7291899 204515  0.76        -1  0.949486    0.914122     1  FALSE
7291899 204515  0.914122    1   0.949486    0.948438     2  FALSE
7291899 204515  0.948438    2   0.949486    0.982754     3  TRUE
7291899 204515  0.982754    3   0.949486    610.9839    -3  FALSE

The thresh_value of 610.9839 belongs to a different IndicatorID (204516), but this value is being used to compute the alarm level of IndicatorID (204515)

Matt W. · Accepted Answer

Sure there is a way to do this. Probably better ways than the one below, but this will work.

Initialize Data:

import pandas as pd
import numpy as np

thresh = [-6.10904,
-12.1848,
-18.2606,
18.19404,
24.2698,
30.34557,
89.94568,
104.2932,
118.6407,
32.55574,
18.20825,
3.860765]

df = pd.DataFrame({'ContextID':[1]*12+[2]*12,
                   'IndicatorID':[5]*6+[6]*6+[7]*6+[8]*6,
                   'threshold_values':thresh*2,
                   'AlarmLevel':[-1, -2, -3, 1, 2, 3, 3, 2, 1, -1, -2, -3]*2,
                   'actual_values':[-17]*6+[64.114]*6+[26]*6+[64.111]*6})

I simplified the ContextID and IndicatorID, I also put some fake values in for the actual_values because yours all are within the proper range. We want to see what happens when they go outside of the proper range.

df = df.sort_values(['ContextID', 'IndicatorID', 'AlarmLevel'])
df['thresh_high'] = df.groupby(['ContextID', 'IndicatorID'])['threshold_values'].shift(-1)
df['alarm_high'] = df.groupby(['ContextID', 'IndicatorID'])['AlarmLevel'].shift(-1)
df['thresh_high'] = df.thresh_high.fillna(np.Inf)
df['alarm_high'] = df.alarm_high.fillna(4)
df['insideThresh'] = (df.actual_values < df.thresh_high) & (df.actual_values > df.threshold_values)

We sort the dataframe, and then create thresh_high and alarm_high which are shifted versions of threshold_values and AlarmLevel

Then we create a column that just shows whether or not the actual value landed inbetween the thresholds.

alarms = df.loc[df.insideThresh == True] \
            .groupby(['ContextID', 'IndicatorID', 'insideThresh'])['AlarmLevel'] \
            .apply(lambda x: x.min()+1 if x.min() < 0 else x.min()

)

Lastly we filter the dataframe only for the times where the actual_values sat in the thresholds, and then we group by ContextId, IndicatorID, and insideThresh (this last one isn't really needed). We take the Alarm Level and apply a custom function telling it that if the minimum of the AlarmLevel from where it breached is negative, take the level up 1, otherwise take the minimum.

How to check between which threshold level does a value lie in?

Answers (1)

Related Questions