How to remove outliers specific to each timestamp?

Question

I am having the below data frame which is a time-series data and I process this information to input to my prediction models.

df = pd.DataFrame({"timestamp": [pd.Timestamp('2019-01-01 01:00:00', tz=None),
                               pd.Timestamp('2019-01-01 01:00:00', tz=None),
                               pd.Timestamp('2019-01-01 01:00:00', tz=None),
                               pd.Timestamp('2019-01-01 02:00:00', tz=None),
                               pd.Timestamp('2019-01-01 02:00:00', tz=None),
                               pd.Timestamp('2019-01-01 02:00:00', tz=None),
                               pd.Timestamp('2019-01-01 03:00:00', tz=None),
                               pd.Timestamp('2019-01-01 03:00:00', tz=None),
                               pd.Timestamp('2019-01-01 03:00:00', tz=None)],
                   "value":[5.4,5.1,100.8,20.12,21.5,80.08,150.09,160.12,20.06]

                  })

From this, I take the mean of the value for each timestamp and will send the value as the input to the predictor. But currently, I am using just thresholds to filter out the outliers,but those seem to filter out real vales and also not filter some outliers .

For example, I kept

df[(df['value']>3 )& (df['value']<120 )]

and then this does not filter out

2019-01-01 01:00:00 100.8

which is an outlier for that timestamp and does filter out

2019-01-01 03:00:00 150.09
2019-01-01 03:00:00 160.12

which are not outliers for that timestamp.

So how do I filter out outliers for each timestamp based on which does not fit that group?

Any help is appreciated.

How to remove outliers specific to each timestamp?

Answers (1)

Related Questions