lsmeans takes the mean of my categorical variables - how to avoid?

Question

I want to find the least square means for a dataset with two categorical variables. They are gender and above/below 55 years of age. The values in the matrix are number of hrs spent watching tv.

I want to find the least squares means of both Age55yr and Gender. Problem is that lsmeans finds the means of the categorical variables too (they are represented as 1 or 2). So instead of getting one row for 1 (male) and 2 (female) I get one averaged row (with the value 1.51).

The output of > lsmeans(tv_age_lm, ~ Gender) is:

$`Gender lsmeans`
   Gender   lsmean        SE  df lower.CL upper.CL
 1.514563 29.59223 0.4416212 100 28.71607  30.4684

What I expected was something like:

 $`Gender lsmeans`
   Gender   lsmean        SE  df lower.CL upper.CL
        1   29.59223 0.4416212 100 28.71607  30.4684
        2   29.59223 0.4416212 100 28.71607  30.4684

That is, I expected that my categorical variables would be left intact in a separate row, instead of averaged. How do I achieve this?

This is the code needed to reproduce the error:

install.packages("lsmeans", repos="http://cran.rstudio.com/")
library(lsmeans)
tvfile <- read.csv2("TVwatch.csv", header=TRUE)
tv_age_lm = lm(TVhrs ~ Age55yr + Gender, data=tvfile)
lsmeans(tv_age_lm, ~ Age55yr)
lsmeans(tv_age_lm, ~ Gender)

The datafile is here: http://textuploader.com/1u27

Sven Hohenstein · Accepted Answer

Currently, the values in colum Gender are represented as integers. Since it it a categorial variable, you have to transform it to a factor:

tvfile$Gender <- as.factor(tvfile$Gender)

Now, you can use lsmeans:

tv_age_lm = lm(TVhrs ~ Age55yr + Gender, data=tvfile)

lsmeans(tv_age_lm, ~ Gender)

#  $`Gender lsmeans`
#   Gender   lsmean        SE  df lower.CL upper.CL
#        1 26.84099 0.6355195 100 25.58013 28.10184
#        2 32.18775 0.6171792 100 30.96328 33.41222

lsmeans takes the mean of my categorical variables - how to avoid?

Answers (1)

Related Questions