Reputation: 68
I'm really struggling with the Pandas rolling_apply function
. I'm trying to apply a filter to some time series data like below and make a new series for outliers. I want the value to return True
when the value is an outlier.
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
window, alpha, gamma = 60, .05, .03
def trim_moments(arr, alpha):
np.sort(arr)
n = len(arr)
k = int(round(n*float(alpha))/2)
return np.mean(arr[k+1:n-k]), np.std(arr[k+1:n-k])
# First function that tests whether criteria is met.
def bg_test(arr,alpha,gamma):
local_mean, local_std = trim_moments(arr, alpha)
return np.abs(arr - local_mean) < 3 * local_std + gamma
This is the function that I run
outliers = pd.rolling_apply(ts, window, bg_test, args=(alpha,gamma))
Returns the error:
TypeError: only length-1 arrays can be converted to Python scalars
My troubleshooting indicates that the problem lies in the boolean return statement. I keep getting the similar error when I simplify the function and use np.mean/std
rather than my own functions. It seems like previous issues with TypeError
were due to performing non-vectorized operations on Numpy Arrays but this doesn't seem to be the issue here.
What am I doing wrong here?
Upvotes: 0
Views: 1321
Reputation: 52246
It's less than a helpful message, but I believe the error is happening because rolling_apply
currently expects a like typed return array (may even have to be float). But, if you break your three operations (mean, std, outlier logic) into steps, it should work ok.
ts.name = 'value'
df = pd.DataFrame(ts)
def trimmed_apply(arr, alpha, f):
np.sort(arr)
n = len(arr)
k = int(round(n*float(alpha))/2)
return f(arr[k+1:n-k])
df['trimmed_mean'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.mean))
df['trimmed_std'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.std))
df['outlier'] = np.abs(arr - df['trimmed_mean']) < 3 * df['trimmed_std'] + gamma
Upvotes: 1