Alex Man
Alex Man

Reputation: 477

Query dataframe by column name as a variable

I know this question has already been asked here, but my question a bit different. Lets say I have following df:

import pandas as pd

df = pd.DataFrame({'A': ('a', 'b', 'c', 'd', 'e', 'a', 'b'), 'B': ('a', 'a', 'g', 'l', 'e', 'a', 'b'), 'C': ('b', 'b', 'g', 'a', 'e', 'a', 'b')})

myList = ['a', 'e', 'b']

I use this line to count the total number of occurrence of each elements of myList in my df columns:

print(df.query('A in @myList ').A.count())
5

Now, I am trying to execute the same thing by looping through columns names. Something like this:

for col in df.columns:
    print(df.query('col in @myList ').col.count())

Also, I was wondering if using query for this is the most efficient way? Thanks for the help.

Upvotes: 0

Views: 278

Answers (1)

sammywemmy
sammywemmy

Reputation: 28709

Use this :

df.isin(myList).sum()

A    5
B    5
C    6
dtype: int64

It checks every cell in the dataframe through myList and returns True or False. Sum uses the 1 or 0 reference and gets the total for each column

Upvotes: 1

Related Questions