Petr
Petr

Reputation: 1817

Filter Pandas dataframe based on the list of columns and list of values

I would like to be able to filter pandas using 2 lists. One list contains columns to be filtered, the second list contains which values of columns on the first list shall be selected respectively.

I have prepared an example:

import seaborn as sns
import pandas as pd

dataf = sns.load_dataset('tips')

cols = ['tip', 'sex']
vals = [1.01, 'Female']

cols2 = ['tip', 'smoker', 'day']
vals2 = [3.00, 'No', 'Sun']

# Pseudo idea of what I need
# Pseudo idea of what I need
dataf.loc[lambda d: d[cols] == vals]
# should be equal to
dataf.loc[(dataf['sex'] == 'Female') & (dataf['tip'] == 1.01)]


dataf.loc[lambda d: d[cols2] == vals2]
# should be equal to
dataf.loc[(dataf['smoker'] == 'No') & (dataf['tip'] == 3) & (dataf['day'] == 'Sun')]

I gave also an idea of what I need. However, it is very important that this should be generalizable, meaning that cols and cols2 can have a different numbers of elements within.

It will always hold that len(cols) == len(vals) ...

Upvotes: 1

Views: 49

Answers (1)

Michael
Michael

Reputation: 2367

This code will provide only the lines which has all the values as in the desired values array:

dataf.loc[:, cols][np.product(np.int8(dataf.loc[:, cols].isin(vals)), axis=1, dtype=bool)]

Output:


tip sex
0   1.01    Female

Cheers.

Upvotes: 1

Related Questions