Reputation: 303
In a pandas dataframe subsets (here my outliers) should be removed:
example:
df = data[~(data.outlier1 == 1)]
But my dataframe has multiple outlier rows.
Is there something like:
df = data[~((data.outlier1 == 1) or (data.outlier2 == 1) or (data.outlier3 == 1))]
The idea is to subtract all outliers (encoded in different rows) at the same time.
Upvotes: 1
Views: 1160
Reputation: 15394
Another method is to truncate outliers by winsorizing. In the example below, each column will be capped and floored at the 5th and 95th percentile, without losing any rows:
import pandas as pd
from scipy.stats import mstats
%matplotlib inline
test_data = pd.Series(range(30))
test_data.plot()
# Truncate values to the 5th and 95th percentiles
transformed_test_data = pd.Series(mstats.winsorize(test_data, limits=[0.05, 0.05]))
transformed_test_data.plot()
Upvotes: 0
Reputation: 393863
IIUC then you just need to use the bitwise or operator |
to test for multiple conditions:
df = data[~((data.outlier1 == 1) | (data.outlier2 == 1) | (data.outlier3 == 1))]
The reason is because you are comparing arrays with a scalar so you should use the bitwise |
operator rather than or
Upvotes: 2