Tox
Tox

Reputation: 854

Check if values of multiple columns are the same (python)

I have a binairy dataframe and I would like to check whether all values in a specific row have the value 1. So for example I have below dataframe. Since row 0 and row 2 all contain value 1 in col1 till col3 the outcome shoud be 1, if they are not it should be 0.

import pandas as pd
d = {'col1': [1, 0,1,0], 'col2': [1, 0,1, 1], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)

Since my own dataframe is much larger I am looking for a more elegant way than the following, any thoughts?

def similar(x):
    if x['col1'] == 1 and x['col2'] == 1 and x['col3'] == 1:
        return 1
    else:
        ''
df['outcome'] = df.apply(similar, axis=1)

Upvotes: 8

Views: 16086

Answers (4)

pavi2410
pavi2410

Reputation: 1285

To check if multiple columns have same values, you could run this:

df[['col1','col2','col3']].apply(lambda d: len(set(d)) == 1, axis=1).nunique() == 1

Even better,

df.T.duplicated(['col1','col2','col3'])

Upvotes: 0

Dave Reikher
Dave Reikher

Reputation: 1954

This is more generic and works for any other value as well. Just replace the second == 1 with == <your value>.

df['outcome'] = 0
df.loc[df.loc[(df.iloc[:,:-1].nunique(axis=1) == 1) \
    & (df.iloc[:,:-1] == 1).all(axis=1)].index, 'outcome'] = 1

Upvotes: 1

Josh Friedlander
Josh Friedlander

Reputation: 11657

A classic case of all.

(The iloc is just there to disregard your current outcome col, if you didn't have it you could just use df == 1.)

df['outcome'] = (df.iloc[:,:-1] == 1).all(1).astype(int) 


    col1    col2    col3    outcome
0   1        1      1           1
1   0        0      0           0
2   1        1      1           1
3   0        1      1           0

Upvotes: 13

U13-Forward
U13-Forward

Reputation: 71610

Try this instead:

df['outcome'] = df.apply(lambda x: 1 if df['col1']==1 and df['col2']==1 and df['col3']==1 else '', axis=1)

Upvotes: 1

Related Questions