Sourav Sarkar
Sourav Sarkar

Reputation: 23

dummyVars producing NA values in output

I have used dummyVars function from Caret package before to make dummy variables out of characters/factors with also missing values (NA) and it worked successfully.

This time, however, the output I get includes NA values. The default is that it treats NA also as a separate entry and makes a dummy variable for it as well. Am I missing something? More worried because it worked last time with NA values.

Using the following code:

dum = dummyVars("~.",data = char_data_raw_train)
char_data_raw_train_dum = predict(dum, newdata = char_data_raw_train)

The output file, i.e. char_data_raw_train_dum includes NAs. Please help.

Upvotes: 0

Views: 802

Answers (1)

geekoverdose
geekoverdose

Reputation: 1007

If you want to have NA as a separate level of a one-hot encoded variable from dummyVars, you could use ?addNA to explicitly define it as level. Here's a small example:

d <- mtcars[,(1:3)]
d$cyl <- factor(d$cyl)
# set some entries to NA
d$cyl[c(1,5,10,15,20)] <- NA 
# explicitly define NA as level
d$cyl <- addNA(d$cyl)
library(caret)
data.frame(predict(dummyVars(data = d, formula = ~.), d))

                     mpg cyl.4 cyl.6 cyl.8 cyl.NA  disp
Mazda RX4           21.0     0     0     0      1 160.0
Mazda RX4 Wag       21.0     0     1     0      0 160.0
Datsun 710          22.8     1     0     0      0 108.0
Hornet 4 Drive      21.4     0     1     0      0 258.0
Hornet Sportabout   18.7     0     0     0      1 360.0
Valiant             18.1     0     1     0      0 225.0
[...]

Upvotes: 2

Related Questions