Dmitriy
Dmitriy

Reputation: 51

Efficient way to select row from a DataFrame based on varying list of columns

Suppose, we have the following DataFrame:

dt = {'A': ['a','a','a','a','a','a','b','b','c'],
      'B': ['x','x','x','y','y','z','x','z','y'],
      'C': [10, 14, 15, 11, 10, 14, 14, 11, 10],
      'D': [1, 3, 2, 1, 3, 5, 1, 4, 2]}
df = pd.DataFrame(data=dt)

I want to extract certain rows based on a dictionary where keys are column names and values are row values. For example:

d = {'A': 'a', 'B': 'x'}
d = {'A': 'a', 'B': 'y', 'C': 10}
d = {'A': 'b', 'B': 'z', 'C': 11, 'D': 4}

It can be done using loop (consider the last dictionary):

for iCol in d:
    df = df[df[iCol] == d[iCol]] 
Out[215]: 
   A  B   C  D
7  b  z  11  4

Since DataFrame is expected to be pretty large and it may have many columns to select on, I am looking for the efficient way to solve the problem without using for loop to iterate the dataframe.

Upvotes: 1

Views: 48

Answers (1)

U13-Forward
U13-Forward

Reputation: 71570

Use the below, Make the dict a Series:

print(df[(df[list(d)] == pd.Series(d)).all(axis=1)])

Output:

   A  B   C  D
7  b  z  11  4

Upvotes: 2

Related Questions