some_programmer
some_programmer

Reputation: 3568

How to check between which threshold level does a value lie in?

I have a dataframe having columns looking like this (having 1400 unique contextIDs and 28 different IndicatorIDs):

ContextID   IndicatorID threshold_values    AlarmLevel  actual_values
7289972        204511   -6.10904                 -1         0
7289972        204511   -12.1848                 -2         0
7289972        204511   -18.2606                 -3         0
7289972        204511   18.19404                  1         0
7289972        204511   24.2698                   2         0
7289972        204511   30.34557                  3         0
7289972        204512   89.94568                  1        64.114
7289972        204512   104.2932                  2        64.114
7289972        204512   118.6407                  3        64.114
7289972        204512   32.55574                 -1        64.114
7289972        204512   18.20825                 -2        64.114
7289972        204512   3.860765                 -3        64.114
7289998        204511   -6.10904                 -1           1
7289998        204511   -12.1848                 -2           1
7289998        204511   -18.2606                 -3           1
7289998        204511   18.19404                  1           1
7289998        204511   24.2698                   2           1
7289998        204511   30.34557                  3           1
7289998        204512   89.94568                  1        64.111
7289998        204512   104.2932                  2        64.111
7289998        204512   118.6407                  3        64.111
7289998        204512   32.55574                 -1        64.111
7289998        204512   18.20825                 -2        64.111
7289998        204512   3.860765                 -3        64.111

The actual_values column is the real value read by a sensor of a machine. The threshold_values columns contain various thresholds defined for various indicators (in the IndicatorID column), depending on which an alarm will be raised if the value crosses a certain limit.

Example: If a value in actual_values lies between the threshold_values defined for alarm level -1 and +1 the product is not defective. But, if the value lies between -1 and -2, an alarm of -1 (since it has crossed the threshold that was defined for -1) must be raised and if the value lies between +1 and +2, an alarm of +1 must be raised, and so on. In the end, the biggest alarm level must be assigned to the ContextID, meaning, if one indicator has raised an alarm of +1 and a second indicator has raised an alarm of -2, the alarm level of -2 must be considered greater and assigned as the final alarm to that ContextID (preferably in a new column).

I wanted some help in implementing this concept. I would like to know if such an implementation can be coded.

I am trying to implement it using 2 different for loops, one for all the ContextIDs and other for the IndicatorIDs, but somehow I am failing in coming up with the logic that can achieve this task.

I would be grateful for help and guidance.

Thanks

Edit 1:

Example:

ContextID   IndicatorID threshold_values    AlarmLevel  actual_values   thresh_high alarm_high  insideThresh
7291899 204515  0.708226    -3  0.949486    0.742542    -2  FALSE
7291899 204515  0.742542    -2  0.949486    0.76        -1  FALSE
7291899 204515  0.76        -1  0.949486    0.914122     1  FALSE
7291899 204515  0.914122    1   0.949486    0.948438     2  FALSE
7291899 204515  0.948438    2   0.949486    0.982754     3  TRUE
7291899 204515  0.982754    3   0.949486    610.9839    -3  FALSE

The thresh_value of 610.9839 belongs to a different IndicatorID (204516), but this value is being used to compute the alarm level of IndicatorID (204515)

Upvotes: 2

Views: 377

Answers (1)

Matt W.
Matt W.

Reputation: 3722

Sure there is a way to do this. Probably better ways than the one below, but this will work.

Initialize Data:

import pandas as pd
import numpy as np

thresh = [-6.10904,
-12.1848,
-18.2606,
18.19404,
24.2698,
30.34557,
89.94568,
104.2932,
118.6407,
32.55574,
18.20825,
3.860765]

df = pd.DataFrame({'ContextID':[1]*12+[2]*12,
                   'IndicatorID':[5]*6+[6]*6+[7]*6+[8]*6,
                   'threshold_values':thresh*2,
                   'AlarmLevel':[-1, -2, -3, 1, 2, 3, 3, 2, 1, -1, -2, -3]*2,
                   'actual_values':[-17]*6+[64.114]*6+[26]*6+[64.111]*6})

I simplified the ContextID and IndicatorID, I also put some fake values in for the actual_values because yours all are within the proper range. We want to see what happens when they go outside of the proper range.

df = df.sort_values(['ContextID', 'IndicatorID', 'AlarmLevel'])
df['thresh_high'] = df.groupby(['ContextID', 'IndicatorID'])['threshold_values'].shift(-1)
df['alarm_high'] = df.groupby(['ContextID', 'IndicatorID'])['AlarmLevel'].shift(-1)
df['thresh_high'] = df.thresh_high.fillna(np.Inf)
df['alarm_high'] = df.alarm_high.fillna(4)
df['insideThresh'] = (df.actual_values < df.thresh_high) & (df.actual_values > df.threshold_values)

We sort the dataframe, and then create thresh_high and alarm_high which are shifted versions of threshold_values and AlarmLevel

Then we create a column that just shows whether or not the actual value landed inbetween the thresholds.

alarms = df.loc[df.insideThresh == True] \
            .groupby(['ContextID', 'IndicatorID', 'insideThresh'])['AlarmLevel'] \
            .apply(lambda x: x.min()+1 if x.min() < 0 else x.min()

)

Lastly we filter the dataframe only for the times where the actual_values sat in the thresholds, and then we group by ContextId, IndicatorID, and insideThresh (this last one isn't really needed). We take the Alarm Level and apply a custom function telling it that if the minimum of the AlarmLevel from where it breached is negative, take the level up 1, otherwise take the minimum.

Upvotes: 2

Related Questions