Reputation: 127
If I want to create a subset of a DataFrame, based on a condition where a specified column can have multiple specified values, I can do this:
df = df.loc[df[column_name].isin(list_of_acceptable_values)]
If I have a list of column names, what is the best way to create a subset of a DataFrame, based on a condition, which checks if these columns contain a particular value. For example, the list of column names is:
['column_1', 'column_2', 'column_3']
And I want to create a new DataFrame, which only has rows from the initial dataframe that contain 0 in column_1, column_2 or column_3
Upvotes: 1
Views: 823
Reputation: 394071
You can pass a list of the column names of interest to subset, then compare with 0
and test for any col values for a row using any(axis=1)
:
In [9]:
df = pd.DataFrame({'a':[1,1,1,0],'b':[1,1,1,1],'c':[1,0,1,1]})
df
Out[9]:
a b c
0 1 1 1
1 1 1 0
2 1 1 1
3 0 1 1
In [10]:
df[(df[['a','c']]==0).any(axis=1)]
Out[10]:
a b c
1 1 1 0
3 0 1 1
For the case where you have a predefined list you don't need to enclose again using square brackets:
In [12]:
col_list = ['a','c']
df[(df[col_list]==0).any(axis=1)]
Out[12]:
a b c
1 1 1 0
3 0 1 1
Upvotes: 3