MrSoLoDoLo
MrSoLoDoLo

Reputation: 375

How to filter multiple columns in a dataframe?

I am trying to filter a dataframe columnwise with conditions specified by an array.

Consider a dataframe with 2 columns.

index     A       B
1        100    200   
2        110    210
3        120    220

and a 2D array that specifies the range of values we want to filter for each column.

(array([105, 115]), array([205, 215]))

Usually, if we were to filter the columns, we would do:

df = df[df['A']<115 & df['A']>105]
df = df[df['B']<215 & df['B']>205]

Is there an efficient way to do this if I had 100 columns without a use of loop?

Upvotes: 0

Views: 202

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

We can pass a query string to the query method :

search_box = np.array([[105, 115], [205, 215]]) # 2D array

search_box
array([[105, 115],
       [205, 215]])

# query string
query = " and ".join(f"{first} < {column} < {last}"
                     for column, (first, last) in 
                     zip(df.columns, search_box))

# pass it to query method : 

df.query(query)

    A   B
1   110 210

Upvotes: 1

AdibP
AdibP

Reputation: 2939

How about

arr = (np.array([105, 115]), np.array([205, 215]))
df[df['A'].between(arr[0][0], arr[0][1]) & df['B'].between(arr[1][0], arr[1][1])]

index   A   B
2       110 210

Upvotes: 1

BENY
BENY

Reputation: 323226

Sometime use for loop is not bad, here we try concat the result of between

newdf = df[pd.concat([df[y].between(*x) for x , y in zip(l, df.columns)],axis=1).all(1)]
Out[52]: 
     A    B
1  110  210

Upvotes: 1

Related Questions