Benoid
Benoid

Reputation: 239

How to check if column is binary? (Pandas)

How to (efficiently!) check if a column is binary ?

   "col"  "col2"
0    0      1
1    0      0
2    0      0
3    0      0
4    0      1

also there might be a problem with columns that arent meant to be binary, but only include zeros.

(I thought of using a list with their names which is filled after the column is added to the DF, but is there a way to directly sign a column as "binary" during creation?)

the purpose is featurescaling for machine learning. (binarys shouldnt be scaled)

Upvotes: 1

Views: 4702

Answers (3)

CodeTrek
CodeTrek

Reputation: 475

That's what I use to also cover all corner cases with mixed string/numeric types

import numpy as np
import pandas as pd

def checkBinary(ser, dropna = False):
    try:
        if dropna:
            ser = pd.to_numeric(ser.dropna(), errors="raise") #With a safety reminder that errors must be raised
        else:
            ser = pd.to_numeric(ser, errors="raise")
    except:
        return False
    return {0,1} == set(pd.unique(ser))

ser = pd.Series(["0",1,"1.000", np.nan])
checkBinary(ser, dropna = True)
>> True

ser = pd.Series(["0",0,"0.000"])
checkBinary(ser)
>> False

Upvotes: 0

Emanuele
Emanuele

Reputation: 178

you can use this:

pd.unique(df[['col', 'col2']].values.ravel('K'))

and it returns:

array([0, 1], dtype=int64)

or you can use also pd.unique for each column

Upvotes: 0

jezrael
jezrael

Reputation: 863166

If want filter columns names with 0 or 1 values:

c = df.columns[df.isin([0,1]).all()]
print (c)
Index(['col', 'col2'], dtype='object')

If need filter columns:

df1 = df.loc[:, df.isin([0,1]).all()]
print (df1)
   col  col2
0    0     1
1    0     0
2    0     0
3    0     0
4    0     1

Upvotes: 3

Related Questions