Dawn
Dawn

Reputation: 3628

Filter pandas with operator chain from list

I'm having a pandas table which contains some columns:

col_list = list('ABC')
df = pd.DataFrame(np.random.randint(10, size=(5,3)), columns=col_list)

    A   B   C
0   8   5   7
1   5   4   1
2   7   5   6
3   6   6   0
4   3   4   1

and I have a list of thresholds according to which I want to filter df:

thr = [3, 6, 9]

Is there a ways to filter df in each column according to the relevant threshold in thr, so that

new_df = df[(df['A']>thr[0]) & (df['B']>thr[1]) & (df['C']>thr[2]) )

without explicitly write it using list comprehension of operators, something like

not_sure = [df.iloc[:, [i]]>thr[i] for i in range(3)]?

Upvotes: 1

Views: 381

Answers (1)

jezrael
jezrael

Reputation: 862781

Use boolean indexing with boolean mask created by np.all:

print (df)
   A  B   C
0  5  8  10
1  5  4   1
2  7  5   6
3  6  6   0
4  3  4   1

thr = [3, 6, 9]

df = df[np.all(df.values > np.array(thr), axis=1)]
print (df)
   A  B   C
0  5  8  10

Pandas solution with DataFrame.gt (>) with DataFrame.all:

df = df[df.gt(thr).all(axis=1)]
print (df)
   A  B   C
0  5  8  10

And solution with list comaprehension:

masks = [df.iloc[:, i] > j for i, j in enumerate(thr)]
df = df[pd.concat(masks, axis=1).all(axis=1)]

Alternative:

df = df[np.logical_and.reduce(masks)]

Explanation:

First compare all values by np.array - is necessary same lengths of thr and columns:

print (df.values > np.array(thr))
[[ True  True  True]
 [ True False False]
 [ True False False]
 [ True False False]
 [False False False]]

Then check all Trues per rows by numpy.all:

print (np.all(df.values > np.array(thr), axis=1))
[ True False False False False]

Upvotes: 4

Related Questions