Reputation: 41
I am coming with a theoretical question about how R works when running model summaries. I am doing some linear regression models where two of my variables are categorical, each with 3 levels, corresponding genotypes. I know that only two of the levels will show in the model summary, seeing as one of the levels has to be a reference. However, these variables of mine have only 1 count for one of the levels, as in:
Variable 1 levels: TT 176 counts / TC 45 counts / CC 1 count (This out of 223 individuals genotyped).
Now, this CC level usually doesn't show up in the model summary, and I'm assuming it's because, since there is only 1, R isn't taking it into account. All I need then is to find a literature reference to confirm or deny my assumption. I've been trying to google this in different ways and going through the R ?help
for lm
and other related searches, but either I haven't found what I'm looking for, or have and didn't understand it as such.
Any help would be greatly appreciated!
Upvotes: 0
Views: 1092
Reputation: 146119
Your assumption is incorrect.
The first level will be the reference level, and the default ordering is alphabetical. Because CC comes first alphabetically, it is the reference level in your model.
It is good practice (reduces variance of other estimates) to use a relatively common value as the reference level. Thus I would suggest modifying the alphabetical default to make TT
the reference level. This should be as easy as
your_data$var = relevel(your_data$var, ref = "TT")
(of course substituting whatever your data frame and variable names are).
The way the levels are set is called the "contrasts". ?contrasts
is a good place to begin reading, and with that search term you should be able to find other docs/references as well. (There are options other than "compare everything to the reference level", but that is out of the scope here.)
Similarly, it sounds suspect to include a level at all that has only a single observation, but that is a statistical question and not a programming one (and would require more information than is in your question), so I won't address it further here.
Upvotes: 1