Reputation: 5542

How can I replace any value with an NAN that is not within a certain range of the previous value in a pandas series?

I have a pandas series and I want to find out if a value is within a certain range of the previous value (say 10% above or below) and replace it with NAN if not. I am not sure how to proceed. The standard outlier removal techniques mostly deal with overall standard deviation etc.

How can I access the previous value at every step and operate on it?

2018-09-06        NaN
2018-09-07        NaN
2018-09-08        NaN
2018-09-09    662.105
2018-09-10    651.010
2018-09-11    454.870
2018-09-12    597.840
2018-09-13    662.405
2018-09-14    660.735
2018-09-15    671.065
2018-09-16    668.485
2018-09-17    666.205
2018-09-18    663.620
2018-09-19    663.320
2018-09-20    662.715
2018-09-21    665.145
2018-09-22    663.015
2018-09-23    663.775
2018-09-24    662.860
2018-09-25    663.315
2018-09-26    665.600
2018-09-27    664.080
2018-09-28    661.800
2018-09-29    659.825
2018-09-30    659.370
2018-10-01        NaN
2018-10-02        NaN
2018-10-03        NaN
2018-10-04        NaN

Upvotes: 2

Answers (3)

Vaishali

Reputation: 38415

You can use pct_change as @ALollz mentioned in the comment. Use Series.loc to set the values where the condition is not met to False.

ts.loc[ts.pct_change().abs() > 0.1] = np.nan

2018-09-06        NaN
2018-09-07        NaN
2018-09-08        NaN
2018-09-09    662.105
2018-09-10    651.010
2018-09-11        NaN
2018-09-12        NaN
2018-09-13        NaN
2018-09-14    660.735
2018-09-15    671.065
2018-09-16    668.485
2018-09-17    666.205
2018-09-18    663.620
2018-09-19    663.320
2018-09-20    662.715
2018-09-21    665.145
2018-09-22    663.015
2018-09-23    663.775
2018-09-24    662.860
2018-09-25    663.315
2018-09-26    665.600
2018-09-27    664.080
2018-09-28    661.800
2018-09-29    659.825
2018-09-30    659.370
2018-10-01        NaN
2018-10-02        NaN
2018-10-03        NaN
2018-10-04        NaN

Upvotes: 4

Andrew Louw

Reputation: 687

Because you need state (the previous row value matters) you can't just use an apply or numpy operation, you're going to need to iterate through the rows. Here is something that will do that, every time it finds an outlier it will set it to Nan and then recursively restart itself so that that outlier doesn't affect the following value. For this to work the series index must be unique.

def remove_outliers(s, i=0):
    tmp = s.dropna()
    tmp = tmp[i:]
    for i, v in enumerate(tmp.iteritems()):
        if i-1 > 0:
            #replace with custom condition, tmp.iloc[i-1] is the previous value
            if not (0.9< v[1]/tmp.iloc[i-1] <1.1):
                s.loc[v[0]] = None
                remove_outliers(s,i)
                break

s =pd.Series([55,51,52,53,54,None,None,600,49,48,50,51,7,None,None,52,None])
remove_outliers(s)

Upvotes: 0

Adarsh Chavakula

Reputation: 1599

You can create a new column to get previous values using the shift method.

df["previous_value"] = df["required_column"].shift(-1)

The percentage change can then be obtained using

df["percent_change"] = (df["previous_value"]-df["required_column"])/df["previous_value"]

You can now filter according to your requirements on percent change

Upvotes: 0

How can I replace any value with an NAN that is not within a certain range of the previous value in a pandas series?

Answers (3)

Related Questions