mommomonthewind
mommomonthewind

Reputation: 4640

pandas: converting all columns with 2 values to True/False

I have a pandas dataframe. Some columns have only 2 unique values (such as GOOD/BAD, HIGH/LOW, FAIL/PASS). Their dtypes are object because the dataframe is loaded from a CSV file.

I want to convert these columns into True/False, and automatic change the column name to is_FIRST_VALUE.

For instance,

  X1   X2   X3  
  HIGH FAIL GOOD
  HIGH PASS GOOD
  LOW  FAIL BAD

should be converted:

X1_is_HIGH  X2_is_FAIL  X3_is_GOOD
True        True        True
True        False       True
False       True        False

Upvotes: 1

Views: 1087

Answers (3)

ejb
ejb

Reputation: 145

You could also use pandas.get_dummies() to convert the categoricals:

import pandas as pd
df = pd.DataFrame({'X1': ['HIGH','HIGH','LOW'], 'X2': ['FAIL','PASS','FAIL'], 'X3': ['GOOD','GOOD','BAD']})
df2 = pd.get_dummies(df, drop_first=True)
print(df2.astype(bool))

# returns:
#   X1_LOW  X2_PASS  X3_GOOD
# 0   False    False     True
# 1   False     True     True
# 2    True    False    False

EDIT: to obtain exactly the output you asked for:

df2 = pd.get_dummies(df)
print(df2.loc[:,df2.iloc[0] == 1].astype(bool))

# returns
#   X1_HIGH  X2_FAIL  X3_GOOD
# 0     True     True     True
# 1     True    False     True
# 2    False     True    False

Upvotes: 2

Mohit Motwani
Mohit Motwani

Reputation: 4792

You could try this. Iterate through each column. get the first unique value of the column. The condition is if the row contains this unique value. As we want result of the condition just set that value to the column.

df = pd.DataFrame({
                'X1' : ['HIGH', 'LOW', 'HIGH', 'HIGH'], 
                'X2' : ['FAIL', 'PASS','FAIL', 'PASS'],
                'X3' : ['GOOD','GOOD', 'BAD', 'BAD']
            })

for column in df.columns:
    uni = df[column].unique()[0]
    mask = df[column] == uni
    df[column] = mask
    df.rename(columns = {column:column+'_'+uni}, inplace=True)

    X1_HIGH         X2_FAIL         X3_GOOD
0   True            True            True
1   False           False           True
2   True            True            False
3   True            False           False

Upvotes: 2

jpp
jpp

Reputation: 164623

You can use a dictionary to specify your True criteria. Then iterate your columns to update them. Finally, use pd.DataFrame.rename to rename columns via a custom function.

d = {'X1': 'HIGH', 'X2': 'FAIL', 'X3': 'GOOD'}

for col in df:
    df[col] = df[col] == d[col]

df = df.rename(columns=lambda x: x+'_'+d[x])

print(df)

  X1_HIGH X2_FAIL X3_GOOD
0    True    True    True
1    True   False    True
2   False    True   False

Upvotes: 2

Related Questions