Reputation: 3628
I'm having a pandas table which contains some columns:
col_list = list('ABC')
df = pd.DataFrame(np.random.randint(10, size=(5,3)), columns=col_list)
A B C
0 8 5 7
1 5 4 1
2 7 5 6
3 6 6 0
4 3 4 1
and I have a list of thresholds according to which I want to filter df:
thr = [3, 6, 9]
Is there a ways to filter df
in each column according to the relevant threshold in thr
, so that
new_df = df[(df['A']>thr[0]) & (df['B']>thr[1]) & (df['C']>thr[2]) )
without explicitly write it using list comprehension of operators, something like
not_sure = [df.iloc[:, [i]]>thr[i] for i in range(3)]
?
Upvotes: 1
Views: 381
Reputation: 862781
Use boolean indexing
with boolean mask created by np.all
:
print (df)
A B C
0 5 8 10
1 5 4 1
2 7 5 6
3 6 6 0
4 3 4 1
thr = [3, 6, 9]
df = df[np.all(df.values > np.array(thr), axis=1)]
print (df)
A B C
0 5 8 10
Pandas solution with DataFrame.gt
(>)
with DataFrame.all
:
df = df[df.gt(thr).all(axis=1)]
print (df)
A B C
0 5 8 10
And solution with list comaprehension:
masks = [df.iloc[:, i] > j for i, j in enumerate(thr)]
df = df[pd.concat(masks, axis=1).all(axis=1)]
Alternative:
df = df[np.logical_and.reduce(masks)]
Explanation:
First compare all values by np.array
- is necessary same lengths of thr
and columns:
print (df.values > np.array(thr))
[[ True True True]
[ True False False]
[ True False False]
[ True False False]
[False False False]]
Then check all True
s per rows by numpy.all
:
print (np.all(df.values > np.array(thr), axis=1))
[ True False False False False]
Upvotes: 4