Reputation: 375
I am trying to filter a dataframe columnwise with conditions specified by an array.
Consider a dataframe with 2 columns.
index A B
1 100 200
2 110 210
3 120 220
and a 2D array that specifies the range of values we want to filter for each column.
(array([105, 115]), array([205, 215]))
Usually, if we were to filter the columns, we would do:
df = df[df['A']<115 & df['A']>105]
df = df[df['B']<215 & df['B']>205]
Is there an efficient way to do this if I had 100 columns without a use of loop?
Upvotes: 0
Views: 202
Reputation: 28644
We can pass a query string to the query
method :
search_box = np.array([[105, 115], [205, 215]]) # 2D array
search_box
array([[105, 115],
[205, 215]])
# query string
query = " and ".join(f"{first} < {column} < {last}"
for column, (first, last) in
zip(df.columns, search_box))
# pass it to query method :
df.query(query)
A B
1 110 210
Upvotes: 1
Reputation: 2939
How about
arr = (np.array([105, 115]), np.array([205, 215]))
df[df['A'].between(arr[0][0], arr[0][1]) & df['B'].between(arr[1][0], arr[1][1])]
index A B
2 110 210
Upvotes: 1
Reputation: 323226
Sometime use for loop is not bad, here we try concat
the result of between
newdf = df[pd.concat([df[y].between(*x) for x , y in zip(l, df.columns)],axis=1).all(1)]
Out[52]:
A B
1 110 210
Upvotes: 1