codemachino
codemachino

Reputation: 33

Converting binary categorical variable to 0's and 1's

I have a dataset where the outcome variable is a binary categorical variable "diagnosis" which is is the type of tumour: "benign" or "malignant".

When converting the variable to numeric ("benign"=0 and "malignant"=1) I use the code:

tumor.df <- fread("df.csv", stringsAsFactors = T)
tumor.df$diagnosis = as.numeric(tumor.df$diagnosis, levels=c('benign', 'malignant'), labels=c(0, 1))

However, instead of diagnosis converting to 0's and 1's, it converts to 1's and 2's. Why is this happening?

Upvotes: 0

Views: 994

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226162

Because R stores factors as an underlying set of integer codes (starting from 1) and a set of associated labels.

I would say you should go ahead and subtract one from the value that you got. There are lots of other ways to do the conversion, that vary in efficiency and readability. One other option would be as.numeric(tumor.df$diagnosis=="malignant") (R converts FALSE to 0, TRUE to 1)

Upvotes: 1

Related Questions