Reputation: 119
I have a dataset contains multiple binary values.
df = pd.DataFrame({"a": ["y", "n"], "b": ["t", "f"],
"c": ["known", "unknown"], "d": ['found', 'not found']})
I want to replace all the binary columns to be 1/0, while not affect other numeric columns. Are there any simple solutions using one or two lines? The dataset contains over 500 columns, which is difficult to check and replace them one by one. Thanks.
Upvotes: 0
Views: 522
Reputation: 59579
Can use pd.get_dummies
with drop_first=True
credit to @piRSquared
pd.get_dummies(df, drop_first=True)
# a_y b_t c_unknown d_not found
#0 1 1 0 0
#1 0 0 1 1
If this needs to be done for only binary object columns subset first.
df = pd.DataFrame({'a': ['y', 'n', 'c'],
'b': ['t', 'f', 't'],
'c': ['known', 'unknown', 'known'],
'd': ['found', 'not found', 'found'],
'e': [1, 2, 2]})
pd.get_dummies(df.loc[:, df.agg('nunique') == 2].select_dtypes(include='object'),
drop_first=True)
# b_t c_unknown d_not found
#0 1 0 0
#1 0 1 1
#2 1 0 0
If there are a small number of binary responses across columns, consider creating a dictionary and mapping the values:
d = {'y': 1, 'n': 0,
't': 1, 'f': 0,
'known': 1, 'unknown': 0,
'found': 1, 'not found': 0}
s = (df.agg('nunique') == 2) & (df.dtypes == 'object')
for col in s[s].index:
df[col] = df[col].map(d)
# a b c d e
#0 y 1 1 1 1
#1 n 0 0 0 2
#2 c 1 1 1 2
# |
# `a` not mapped because trinary
Upvotes: 1