Reputation: 875
My DataFrame contains multiple time series, I want to flag whenever a point in each time series goes one standard deviation above the mean.
df = pd.DataFrame(np.random.rand(3, 10), index=['ts_A', 'ts_B','ts_C'])
std = df.std(axis=1)
mean = df.mean(axis=1)
And then I was hoping to be able to do:
df.mask(df > (std + mean), 'True', inplace=True)
Which should return the original DataFrame where any value which is more than one standard deviation above the mean for that row/time series is replaced by True.
However, instead this returns false for every element. If I use df.where instead the entire DataFrame gets filled with True.
I could do this by iterating through the index and masking each row in turn but I'm sure there must be a better way.
Upvotes: 0
Views: 191
Reputation: 323386
Using gt
with axis=0
df.mask(df.gt(std + mean,axis=0), 'True', inplace=True)
df
0 1 2 3 4 5 6
ts_A 0.003797 0.060297 0.265496 0.442663 True 0.498443 0.436738
ts_B 0.127535 0.644332 True 0.079317 0.0411021 True 0.830672
ts_C 0.693698 0.429689 0.371802 0.312407 0.0555868 True True
7 8 9
ts_A 0.403529 0.392445 0.238355
ts_B 0.732539 0.030451 0.895976
ts_C 0.907143 0.912002 0.098821
If need return T and F
TorF=df.gt(std + mean,axis=0)
TorF
Out[31]:
0 1 2 3 4 5 6 7 8 9
ts_A False False False False True False False False False False
ts_B False False True False False True False False False False
ts_C False False False False False True True False False False
Upvotes: 2