Reputation: 5542
I have a pandas series and I want to find out if a value is within a certain range of the previous value (say 10% above or below) and replace it with NAN if not. I am not sure how to proceed. The standard outlier removal techniques mostly deal with overall standard deviation etc.
How can I access the previous value at every step and operate on it?
2018-09-06 NaN
2018-09-07 NaN
2018-09-08 NaN
2018-09-09 662.105
2018-09-10 651.010
2018-09-11 454.870
2018-09-12 597.840
2018-09-13 662.405
2018-09-14 660.735
2018-09-15 671.065
2018-09-16 668.485
2018-09-17 666.205
2018-09-18 663.620
2018-09-19 663.320
2018-09-20 662.715
2018-09-21 665.145
2018-09-22 663.015
2018-09-23 663.775
2018-09-24 662.860
2018-09-25 663.315
2018-09-26 665.600
2018-09-27 664.080
2018-09-28 661.800
2018-09-29 659.825
2018-09-30 659.370
2018-10-01 NaN
2018-10-02 NaN
2018-10-03 NaN
2018-10-04 NaN
Upvotes: 2
Views: 76
Reputation: 38415
You can use pct_change
as @ALollz mentioned in the comment. Use Series.loc
to set the values where the condition is not met to False.
ts.loc[ts.pct_change().abs() > 0.1] = np.nan
2018-09-06 NaN
2018-09-07 NaN
2018-09-08 NaN
2018-09-09 662.105
2018-09-10 651.010
2018-09-11 NaN
2018-09-12 NaN
2018-09-13 NaN
2018-09-14 660.735
2018-09-15 671.065
2018-09-16 668.485
2018-09-17 666.205
2018-09-18 663.620
2018-09-19 663.320
2018-09-20 662.715
2018-09-21 665.145
2018-09-22 663.015
2018-09-23 663.775
2018-09-24 662.860
2018-09-25 663.315
2018-09-26 665.600
2018-09-27 664.080
2018-09-28 661.800
2018-09-29 659.825
2018-09-30 659.370
2018-10-01 NaN
2018-10-02 NaN
2018-10-03 NaN
2018-10-04 NaN
Upvotes: 4
Reputation: 687
Because you need state (the previous row value matters) you can't just use an apply or numpy operation, you're going to need to iterate through the rows. Here is something that will do that, every time it finds an outlier it will set it to Nan and then recursively restart itself so that that outlier doesn't affect the following value. For this to work the series index must be unique.
def remove_outliers(s, i=0):
tmp = s.dropna()
tmp = tmp[i:]
for i, v in enumerate(tmp.iteritems()):
if i-1 > 0:
#replace with custom condition, tmp.iloc[i-1] is the previous value
if not (0.9< v[1]/tmp.iloc[i-1] <1.1):
s.loc[v[0]] = None
remove_outliers(s,i)
break
s =pd.Series([55,51,52,53,54,None,None,600,49,48,50,51,7,None,None,52,None])
remove_outliers(s)
Upvotes: 0
Reputation: 1599
You can create a new column to get previous values using the shift
method.
df["previous_value"] = df["required_column"].shift(-1)
The percentage change can then be obtained using
df["percent_change"] = (df["previous_value"]-df["required_column"])/df["previous_value"]
You can now filter according to your requirements on percent change
Upvotes: 0