ChelseaSupencheck
ChelseaSupencheck

Reputation: 39

How do I split into train and test set after creating dummy variables?

I have already created dummy variables for all my categorical columns, but I need to split my data into train and test set, with my target being "Loan_Status". I am confused because after creating dummy variables, this creates two new columns for "Loan_Status", so when or how would I split my data and create the target?

# Convert the categorical features into dummy variables.

df_dummies = pd.get_dummies(df1, columns=['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Loan_Status'])
df_dummies.head()

turns into this loan status after creating dummy variables, now two columns

It looked like this before, so how would i create the target to be loan status, wouldnt splitting the data before dummys create issues? enter image description here

Upvotes: 0

Views: 288

Answers (1)

dx2-66
dx2-66

Reputation: 2851

As a rule of thumb, you should stick to pd.get_dummies(drop_first=True, ...) to avoid creating redundant columns, as N-1 columns contain full information about N possible values.

However, one hot encoding is a bit excessive for binary values, you're probably better off just using something like .map().

Upvotes: 0

Related Questions