Reputation: 433
I have a df with 9 columns. Each column has values 0,1.
1 -means outlier.
It's outliers according to 9 different algorithms. I want to select those true outliers, the following query does work.
true_outliers= outliers[
(outliers['isolation_forest_300000']==1) &
(outliers['knn_1000']==1) &
(outliers['knn_10000']==1)&
(outliers['abod_neighbors_5_1000']==1)&
(outliers['abod_neighbors_5_10000']==1)&
(outliers['abod_neighbors_10_1000']==1)&
(outliers['hbos_1000']==1)&
(outliers['hbos_10000']==1)&
(outliers['hbos_100000']==1)]
however how can i refactor it like this:
for col in outliers.columns.tolist():
s= outliers[outliers[col] == 1]
I want it go thru loop and only select those rows that are '1' in each column
Upvotes: 0
Views: 68
Reputation: 1986
I think this can help you:
import functools
import operator
import pandas as pd
data = [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]
df = pd.DataFrame(
data, columns=[str(i) for i in range(9)]
)
condition = functools.reduce(
operator.and_,
(df[col] == 1 for col in df.columns)
)
print(df[condition])
Upvotes: 0
Reputation: 25239
If you want to select rows with 1
on every column, using a mask is better
Sample df:
Out[266]:
isolation_forest_300000 knn_1000 knn_10000 abod_neighbors_5_1000 \
0 1 1 1 1
1 0 0 0 1
2 0 0 0 0
3 1 1 1 1
abod_neighbors_5_10000 abod_neighbors_10_1000 hbos_1000 hbos_10000 \
0 1 1 1 1
1 1 0 0 0
2 0 0 0 0
3 1 1 1 1
hbos_100000
0 1
1 0
2 0
3 1
use eq
and all
to create mask and slicing
df[df.eq(1).all(1)]
Out[267]:
isolation_forest_300000 knn_1000 knn_10000 abod_neighbors_5_1000 \
0 1 1 1 1
3 1 1 1 1
abod_neighbors_5_10000 abod_neighbors_10_1000 hbos_1000 hbos_10000 \
0 1 1 1 1
3 1 1 1 1
hbos_100000
0 1
3 1
Upvotes: 1