How to subset a DataFrame, using loc(), based on multiple columns having a particular value with Python?

Question

If I want to create a subset of a DataFrame, based on a condition where a specified column can have multiple specified values, I can do this:

df = df.loc[df[column_name].isin(list_of_acceptable_values)]

If I have a list of column names, what is the best way to create a subset of a DataFrame, based on a condition, which checks if these columns contain a particular value. For example, the list of column names is:

['column_1', 'column_2', 'column_3']

And I want to create a new DataFrame, which only has rows from the initial dataframe that contain 0 in column_1, column_2 or column_3

EdChum · Accepted Answer

You can pass a list of the column names of interest to subset, then compare with 0 and test for any col values for a row using any(axis=1):

In [9]:
df = pd.DataFrame({'a':[1,1,1,0],'b':[1,1,1,1],'c':[1,0,1,1]})
df

Out[9]:
   a  b  c
0  1  1  1
1  1  1  0
2  1  1  1
3  0  1  1

In [10]:
df[(df[['a','c']]==0).any(axis=1)]

Out[10]:
   a  b  c
1  1  1  0
3  0  1  1

For the case where you have a predefined list you don't need to enclose again using square brackets:

In [12]:
col_list = ['a','c']
df[(df[col_list]==0).any(axis=1)]

Out[12]:
   a  b  c
1  1  1  0
3  0  1  1

How to subset a DataFrame, using loc(), based on multiple columns having a particular value with Python?

Answers (1)

Related Questions