RayVelcoro
RayVelcoro

Reputation: 534

Displaying of factor levels and labels in R

I am having an issue with displaying the correct grouping of a factor variable after using MICE. I believe this is an R thing, but I included it with mice just to be sure.

So, I run my mice algorithm, here is a snipit of how I call I format it in the mice algorithm. Note, I want it to be 0 for no drug, and 1 for yes drug, so I coerce it to be a factor with levels 0 and 1 before I run it

mydat$drug=factor(mydat$drug,levels=c(0,1),labels=c(0,1))

I then run mice and it runs logistic regression (this is the default) on drug, along with my other variables to be imputed.

I can extract the results of one of the imputations when it is complete by

drug=complete(imp,1)$drug

We can view it

> head(drug)
[1] 0 0 1 0 1 1
attr(,"contrasts")
  2
0 0
1 1
Levels: 0 1

So the data is certainly 0,1.

However, when I do something with it, like cbind, it changes to 1's and 2's

> head(cbind(drug))
 drug
[1,]    1
[2,]    1
[3,]    2
[4,]    1
[5,]    2
[6,]    2

Even when I coerce it to a numeric

> head(as.numeric(drug))
[1] 1 1 2 1 2 2

I want to say it has something to do with the contrasts, but when I delete the contrast by doing

attr(drug,"contrasts")=NULL

It still shows up with 1's and 2's when called and printed by others.

I am able to get it to print correctly by using I()

> head(I(drug))
[1] 0 0 1 0 1 1
Levels: 0 1

So, I believe that this is an R issue, but I don't know how to remedy it. Is using I() the correct solution, or is it just a workaround that happens to work here? What is actually happening behind the scenes that is making the output display as 1's and 2's?

Thanks

Upvotes: 3

Views: 7467

Answers (3)

Carl Frederick
Carl Frederick

Reputation: 131

This is how R encodes factors. The underlying numeric representation of the factors always starts with 1. As you can see with the following to examples:

as.numeric(factor(c(0,1)))
as.numeric(factor(c(A,B)))

Not sure about the specifics about how MICE works, but if it requires a factor instead of a simple 0/1 numeric variable to use logistic regression, you can always hack the results with something like the following:

as.numeric(as.character(factor(c(0,1)))) 

or in your specific case

drug <- as.numeric(as.character(drug))

Upvotes: 0

Se&#241;or O
Se&#241;or O

Reputation: 17432

Factors start with the first level being represented internally by 1.

Your two options:

1) Adjust for 1-based index of levels:

as.numeric(drug) - 1

2) Take the labels of the factors and convert to numeric:

as.numeric(as.character(drug))

Some people will point you in the direction of the faster option that does the same thing:

as.numeric(levels(drug))[drug]

I'd also consider using logical values instead of factor in the first place.

mydat$drug = as.logical(mydat$drug) 

Upvotes: 2

Rorschach
Rorschach

Reputation: 32446

The 0s and 1s are the names of your levels. The underlying integer corresponding to the names is 1 and 2. You can see with str,

str(drug)
# Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 2 2

When you coerce the factor to numeric, you drop the names and get the integer representation.

Upvotes: 2

Related Questions