Reputation: 23
I have used dummyVars
function from Caret
package before to make dummy variables out of characters/factors with also missing values (NA)
and it worked successfully.
This time, however, the output I get includes NA
values. The default is that it treats NA
also as a separate entry and makes a dummy variable for it as well. Am I missing something? More worried because it worked last time with NA
values.
Using the following code:
dum = dummyVars("~.",data = char_data_raw_train)
char_data_raw_train_dum = predict(dum, newdata = char_data_raw_train)
The output file, i.e. char_data_raw_train_dum
includes NA
s. Please help.
Upvotes: 0
Views: 802
Reputation: 1007
If you want to have NA
as a separate level of a one-hot encoded variable from dummyVars
, you could use ?addNA
to explicitly define it as level. Here's a small example:
d <- mtcars[,(1:3)]
d$cyl <- factor(d$cyl)
# set some entries to NA
d$cyl[c(1,5,10,15,20)] <- NA
# explicitly define NA as level
d$cyl <- addNA(d$cyl)
library(caret)
data.frame(predict(dummyVars(data = d, formula = ~.), d))
mpg cyl.4 cyl.6 cyl.8 cyl.NA disp
Mazda RX4 21.0 0 0 0 1 160.0
Mazda RX4 Wag 21.0 0 1 0 0 160.0
Datsun 710 22.8 1 0 0 0 108.0
Hornet 4 Drive 21.4 0 1 0 0 258.0
Hornet Sportabout 18.7 0 0 0 1 360.0
Valiant 18.1 0 1 0 0 225.0
[...]
Upvotes: 2