WY G
WY G

Reputation: 129

Pandas.get_dummies return to two columns(_Y and _N) instead of one

I am trying to use sklearn to train a decision tree based on my dataset.

When I was trying to slicing the data to (outcome:Y, and predicting variables:X), it turns out that the outcome (my label) is in True/False:

#data slicing 
X = df.values[:,3:27] #X are the sets of predicting variable, dropping unique_id and student name here
Y = df.values[:,'OffTask'] #Y is our predicted value (outcome), it is in the 3rd column 

This is how I do, but I do not know whether this is the right approach:

#convert the label "OffTask" to dummy 

df1 = pd.get_dummies(df,columns=["OffTask"])
df1

My trouble is the dataset df1 return my label Offtask to OffTask_N and OffTask_Y

Can someone know how to fix it?

Upvotes: 4

Views: 1793

Answers (2)

Pradeep Pandey
Pradeep Pandey

Reputation: 307

get_dummies is used for converting nominal string values to integer. It returns as many as column as many unique string values are available in columns eg:

df={'color':['red','green','blue'],'price':[1200,3000,2500]}
my_df=pd.DataFrame(df)
pd.get_dummies(my_df)

In your case you can drop first value, wherever value is null can be considered it will be first value

Upvotes: 1

Venkatachalam
Venkatachalam

Reputation: 16966

You could make the pd.get_dummies to return only one column by setting drop_first=True

y = pd.get_dummies(df,columns=["OffTask"], drop_first=True)

But this is not the recommended way to convert the label to binaries. I would suggest using labelbinarizer for this purpose.

Example:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit_transform(pd.DataFrame({'OffTask':['yes', 'no', 'no', 'yes']}))

#
array([[1],
       [0],
       [0],
       [1]])

Upvotes: 0

Related Questions