Reputation: 734
Say I have a dataframe with features and labels:
f1 f2 label
-1000 -100 1
-5 3 2
0 4 3
1 5 1
3 6 1
1000 100 2
I want to filter outliers from columns f1 and f2 to get:
f1 f2 label
-5 3 2
0 4 3
1 5 1
3 6 1
I know that I can do something like this:
data = data[(data > data.quantile(.05)) & ( data < data.quantile(.95))]
But 'label' column will also be filtered. How can I avoid filtering some column? I don't want to filter all columns manually because there are dozens of them. Thanks.
Upvotes: 3
Views: 1585
Reputation: 210942
what about the following approach:
In [306]: x = data.drop('label', 1)
In [307]: x.columns
Out[307]: Index(['f1', 'f2'], dtype='object')
In [308]: data[((x > x.quantile(.05)) & (x < x.quantile(.95))).all(1)]
Out[308]:
f1 f2 label
1 -5 3 2
2 0 4 3
3 1 5 1
4 3 6 1
Upvotes: 2