Reputation: 4640
I have a pandas dataframe. Some columns have only 2 unique values (such as GOOD/BAD, HIGH/LOW, FAIL/PASS). Their dtypes are object
because the dataframe is loaded from a CSV file.
I want to convert these columns into True/False, and automatic change the column name to is_FIRST_VALUE
.
For instance,
X1 X2 X3
HIGH FAIL GOOD
HIGH PASS GOOD
LOW FAIL BAD
should be converted:
X1_is_HIGH X2_is_FAIL X3_is_GOOD
True True True
True False True
False True False
Upvotes: 1
Views: 1087
Reputation: 145
You could also use pandas.get_dummies()
to convert the categoricals:
import pandas as pd
df = pd.DataFrame({'X1': ['HIGH','HIGH','LOW'], 'X2': ['FAIL','PASS','FAIL'], 'X3': ['GOOD','GOOD','BAD']})
df2 = pd.get_dummies(df, drop_first=True)
print(df2.astype(bool))
# returns:
# X1_LOW X2_PASS X3_GOOD
# 0 False False True
# 1 False True True
# 2 True False False
EDIT: to obtain exactly the output you asked for:
df2 = pd.get_dummies(df)
print(df2.loc[:,df2.iloc[0] == 1].astype(bool))
# returns
# X1_HIGH X2_FAIL X3_GOOD
# 0 True True True
# 1 True False True
# 2 False True False
Upvotes: 2
Reputation: 4792
You could try this. Iterate through each column. get the first unique value of the column. The condition is if the row contains this unique value. As we want result of the condition just set that value to the column.
df = pd.DataFrame({
'X1' : ['HIGH', 'LOW', 'HIGH', 'HIGH'],
'X2' : ['FAIL', 'PASS','FAIL', 'PASS'],
'X3' : ['GOOD','GOOD', 'BAD', 'BAD']
})
for column in df.columns:
uni = df[column].unique()[0]
mask = df[column] == uni
df[column] = mask
df.rename(columns = {column:column+'_'+uni}, inplace=True)
X1_HIGH X2_FAIL X3_GOOD
0 True True True
1 False False True
2 True True False
3 True False False
Upvotes: 2
Reputation: 164623
You can use a dictionary to specify your True
criteria. Then iterate your columns to update them. Finally, use pd.DataFrame.rename
to rename columns via a custom function.
d = {'X1': 'HIGH', 'X2': 'FAIL', 'X3': 'GOOD'}
for col in df:
df[col] = df[col] == d[col]
df = df.rename(columns=lambda x: x+'_'+d[x])
print(df)
X1_HIGH X2_FAIL X3_GOOD
0 True True True
1 True False True
2 False True False
Upvotes: 2