Reputation: 534
I am having an issue with displaying the correct grouping of a factor variable after using MICE. I believe this is an R thing, but I included it with mice just to be sure.
So, I run my mice algorithm, here is a snipit of how I call I format it in the mice algorithm. Note, I want it to be 0 for no drug, and 1 for yes drug, so I coerce it to be a factor with levels 0 and 1 before I run it
mydat$drug=factor(mydat$drug,levels=c(0,1),labels=c(0,1))
I then run mice and it runs logistic regression (this is the default) on drug, along with my other variables to be imputed.
I can extract the results of one of the imputations when it is complete by
drug=complete(imp,1)$drug
We can view it
> head(drug)
[1] 0 0 1 0 1 1
attr(,"contrasts")
2
0 0
1 1
Levels: 0 1
So the data is certainly 0,1.
However, when I do something with it, like cbind, it changes to 1's and 2's
> head(cbind(drug))
drug
[1,] 1
[2,] 1
[3,] 2
[4,] 1
[5,] 2
[6,] 2
Even when I coerce it to a numeric
> head(as.numeric(drug))
[1] 1 1 2 1 2 2
I want to say it has something to do with the contrasts, but when I delete the contrast by doing
attr(drug,"contrasts")=NULL
It still shows up with 1's and 2's when called and printed by others.
I am able to get it to print correctly by using I()
> head(I(drug))
[1] 0 0 1 0 1 1
Levels: 0 1
So, I believe that this is an R issue, but I don't know how to remedy it. Is using I() the correct solution, or is it just a workaround that happens to work here? What is actually happening behind the scenes that is making the output display as 1's and 2's?
Thanks
Upvotes: 3
Views: 7467
Reputation: 131
This is how R encodes factors. The underlying numeric representation of the factors always starts with 1. As you can see with the following to examples:
as.numeric(factor(c(0,1)))
as.numeric(factor(c(A,B)))
Not sure about the specifics about how MICE works, but if it requires a factor instead of a simple 0/1 numeric variable to use logistic regression, you can always hack the results with something like the following:
as.numeric(as.character(factor(c(0,1))))
or in your specific case
drug <- as.numeric(as.character(drug))
Upvotes: 0
Reputation: 17432
Factors start with the first level being represented internally by 1.
Your two options:
1) Adjust for 1-based index of levels:
as.numeric(drug) - 1
2) Take the labels of the factors and convert to numeric:
as.numeric(as.character(drug))
Some people will point you in the direction of the faster option that does the same thing:
as.numeric(levels(drug))[drug]
I'd also consider using logical
values instead of factor
in the first place.
mydat$drug = as.logical(mydat$drug)
Upvotes: 2
Reputation: 32446
The 0s and 1s are the names of your levels. The underlying integer corresponding to the names is 1 and 2. You can see with str
,
str(drug)
# Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 2 2
When you coerce the factor to numeric, you drop the names and get the integer representation.
Upvotes: 2