ScientiaEtVeritas
ScientiaEtVeritas

Reputation: 5278

Pandas: Select Rows with condition for several columns

I'm using this to conditionally select rows of column:

X.loc[data['column'] == 1]

But I want to expand this condition to several columns. These columns have something in common: They contain a same string. So actually I have a column1, a column2, ... , column100 etc. and this condition should apply to all of these columns. Actually something like this (wildcard):

X.loc[data['column*'] == 1]

These conditions should be linked with OR. Any chance to do this easily?

Upvotes: 1

Views: 5882

Answers (4)

Maarten Fabré
Maarten Fabré

Reputation: 7058

test Dataframe

    col0 col1 col2
0   1    1    2
1   1    1    1
2   2    2    2

make a new dataframe with the test for all columns

result_s = d.concat((df['col%i'%i] == 1 for i in range(3)), axis=1).all(axis=1)

results in

0    False
1     True
2    False
dtype: bool

if you do df[result_s] you get

    col0 col1 col2
1   1    1    1

this selects the rows where all columns are ==1 If one of the is enough, change the .all() to .any

    col0 col1 col2
0   1    1    2
1   1    1    1

Upvotes: 3

Andrew L
Andrew L

Reputation: 7038

Here's another way to consider:

df
   col0  col1  col2
0     1     1     2
1     1     1     1
2     2     2     2

df.loc[df['col0'] == 1, [x for x in df.columns if x == 'col0']]
   col0
0     1
1     1

You can use list comprehension to find the columns you're looking for.

Upvotes: 0

Chuck
Chuck

Reputation: 3852

For some dataframe X

   p A  p B  p C
0    0    0    0
1    0    0    0
2    0    0    1
3    0    0    0
4    0    0    0
5    0    0    0
6    1    0    0

If you can set up the names of the columns you want to test for in col_list

col_list = X.columns

You can then use np.any() to test with or between each:

X.loc[(X[col_list] == 1).any(axis=1)]

Which gives you:

   p A  p B  p C
2    0    0    1
6    1    0    0

Informed you don't need loc and will still get the same answer, credit to @MaartynFabre for the info

X[(X[col_list] == 1).any(axis=1)]

   p A  p B  p C
2    0    0    1
6    1    0    0

Upvotes: 7

rhombuzz
rhombuzz

Reputation: 97

Put each comparison in brackets and combine them with logical operators:

pd.DataFrame(X).loc[(data['col1']==23) & (data['col2']==42)] # and
pd.DataFrame(X).loc[(data['col1']==23) | (data['col2']==42)] # or

Upvotes: 1

Related Questions