Reputation: 33
I have a dataset where the outcome variable is a binary categorical variable "diagnosis" which is is the type of tumour: "benign" or "malignant".
When converting the variable to numeric ("benign"=0 and "malignant"=1) I use the code:
tumor.df <- fread("df.csv", stringsAsFactors = T)
tumor.df$diagnosis = as.numeric(tumor.df$diagnosis, levels=c('benign', 'malignant'), labels=c(0, 1))
However, instead of diagnosis converting to 0's and 1's, it converts to 1's and 2's. Why is this happening?
Upvotes: 0
Views: 994
Reputation: 226162
Because R stores factors as an underlying set of integer codes (starting from 1) and a set of associated labels.
I would say you should go ahead and subtract one from the value that you got. There are lots of other ways to do the conversion, that vary in efficiency and readability. One other option would be as.numeric(tumor.df$diagnosis=="malignant")
(R converts FALSE
to 0, TRUE
to 1)
Upvotes: 1