Reputation: 31898
I want to find the least square means for a dataset with two categorical variables. They are gender and above/below 55 years of age. The values in the matrix are number of hrs spent watching tv.
I want to find the least squares means of both Age55yr and Gender. Problem is that lsmeans finds the means of the categorical variables too (they are represented as 1 or 2). So instead of getting one row for 1 (male) and 2 (female) I get one averaged row (with the value 1.51).
The output of > lsmeans(tv_age_lm, ~ Gender)
is:
$`Gender lsmeans`
Gender lsmean SE df lower.CL upper.CL
1.514563 29.59223 0.4416212 100 28.71607 30.4684
What I expected was something like:
$`Gender lsmeans`
Gender lsmean SE df lower.CL upper.CL
1 29.59223 0.4416212 100 28.71607 30.4684
2 29.59223 0.4416212 100 28.71607 30.4684
That is, I expected that my categorical variables would be left intact in a separate row, instead of averaged. How do I achieve this?
This is the code needed to reproduce the error:
install.packages("lsmeans", repos="http://cran.rstudio.com/")
library(lsmeans)
tvfile <- read.csv2("TVwatch.csv", header=TRUE)
tv_age_lm = lm(TVhrs ~ Age55yr + Gender, data=tvfile)
lsmeans(tv_age_lm, ~ Age55yr)
lsmeans(tv_age_lm, ~ Gender)
The datafile is here: http://textuploader.com/1u27
Upvotes: 0
Views: 1200
Reputation: 81683
Currently, the values in colum Gender
are represented as integers. Since it it a categorial variable, you have to transform it to a factor:
tvfile$Gender <- as.factor(tvfile$Gender)
Now, you can use lsmeans
:
tv_age_lm = lm(TVhrs ~ Age55yr + Gender, data=tvfile)
lsmeans(tv_age_lm, ~ Gender)
# $`Gender lsmeans`
# Gender lsmean SE df lower.CL upper.CL
# 1 26.84099 0.6355195 100 25.58013 28.10184
# 2 32.18775 0.6171792 100 30.96328 33.41222
Upvotes: 3