Reputation: 564
Assuming I have a pandas dataframe, I use the following to remove outliers:
y = df['Label']
df = df.drop(['Label'], axis=1)
new_df = df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]
Since I don't want to include 'Label'
column in the process, how to also remove the outlier labels?
Thank you
Upvotes: 1
Views: 305
Reputation: 11
With the autooptimizer
module, you can easily remove outliers from your dataset.
It uses the Interquartile range method to remove outliers
pip install autooptimizer
from autooptimizer.process import outlier_removal
outlier_removal(data)
Upvotes: 0
Reputation: 518
You can use the remaining index to match the new df and Label column:
new_df.join(y)
Upvotes: 1
Reputation: 13527
Just perform the zscore calculation on the columns with a numeric dtype. No need to drop the "Label" column before hand.
new_df = df[(np.abs(stats.zscore(df.select_dtypes("numeric"))) < 3).all(axis=1)]
Upvotes: 2