Reputation: 2977
I'm a complete beginner with R and I need to perform regressions on some data sets. My problem is, I'm not sure, how to rewrite the model into the mathematical formula.
Most confusing are interactions
and poly function.
Can they be understood like a product and a polynomial?
Let's have following model, both a
and b
are vectors of numbers:
y ~ poly(a, 2):b
Can it be rewritten mathematically like this?
y = a*b + a^2 * b
And when I get a following expression from fit summary
poly(a, 2)2:b
is it equal to the following formula?
a^2 * b
Upvotes: 1
Views: 1457
Reputation: 73405
Your question has two fold:
poly
do;:
do.For the first question, I refer you to my answer https://stackoverflow.com/a/39051154/4891738 for a complete explanation of poly
. Note that for most users, it is sufficient to know that it generates a design matrix of degree
number or columns, each of which being a basis function.
:
is not a misery. In your case where b
is also a numeric, poly(a, 2):b
will return
Xa <- poly(a, 2) # a matrix of two columns
X <- Xa * b # row scaling to Xa by b
So your guess in the question is correct. But note that poly
gives you orthogonal polynomial basis, so it is not as same as I(a)
and I(a^2)
. You can set raw = TRUE
when calling poly
to get ordinary polynomial basis.
Xa
has column names. poly(a,2)2
just means the 2nd column of Xa
.
Note that when b
is a factor, there will be a design matrix, say Xb
, for b
. Obviously this is a 0-1 binary matrix as factor variables are coded as dummy variables. Then poly(a,2):b
forms a row-wise Kronecker product between Xa
and Xb
. This sounds tricky, but is essentially just pair-wise multiplication between all columns of two matrices. So if Xa
has ka
columns and Xb
has kb
columns, the resulting matrix has ka * kb
columns. Such mixing is called 'interaction'.
The resulting matrix also has column names. For example, poly(a, 2)2:b3
means the product of the 2nd column of Xa
and the dummy column in Xb
for the third level of b
. I am not saying 'the 3rd column of Xb' as this is false if b
is contrasted. Usually a factor will be contrasted so if b
has 5 levels, Xb
will have 4 columns. Then the dummy column for third level will be the 2nd column of Xb
, if the first factor level is the reference level (hence not appearing in Xb
).
Upvotes: 2