iRoygbiv
iRoygbiv

Reputation: 875

Mask DataFrame with list of values as condition

My DataFrame contains multiple time series, I want to flag whenever a point in each time series goes one standard deviation above the mean.

df = pd.DataFrame(np.random.rand(3, 10), index=['ts_A', 'ts_B','ts_C'])

std = df.std(axis=1)
mean = df.mean(axis=1)

And then I was hoping to be able to do:

df.mask(df > (std + mean), 'True', inplace=True)

Which should return the original DataFrame where any value which is more than one standard deviation above the mean for that row/time series is replaced by True.

However, instead this returns false for every element. If I use df.where instead the entire DataFrame gets filled with True.

I could do this by iterating through the index and masking each row in turn but I'm sure there must be a better way.

Upvotes: 0

Views: 191

Answers (1)

BENY
BENY

Reputation: 323386

Using gt with axis=0

df.mask(df.gt(std + mean,axis=0), 'True', inplace=True)
df
             0         1         2         3          4         5         6 
ts_A  0.003797  0.060297  0.265496  0.442663       True  0.498443  0.436738   
ts_B  0.127535  0.644332      True  0.079317  0.0411021      True  0.830672   
ts_C  0.693698  0.429689  0.371802  0.312407  0.0555868      True      True   
             7         8         9  
ts_A  0.403529  0.392445  0.238355  
ts_B  0.732539  0.030451  0.895976  
ts_C  0.907143  0.912002  0.098821 

If need return T and F

TorF=df.gt(std + mean,axis=0)
TorF
Out[31]: 
          0      1      2      3      4      5      6      7      8      9
ts_A  False  False  False  False   True  False  False  False  False  False
ts_B  False  False   True  False  False   True  False  False  False  False
ts_C  False  False  False  False  False   True   True  False  False  False

Upvotes: 2

Related Questions