Reputation: 525
Assuming I have a dataframe consisting of three columns
set.seed(24)
df1 <- data.frame(a=runif(10),b=runif(10),c=runif(10))
And want to have one with six columns of all interactions:
a*a, a*b, a*c, b*c, b*b, c*c
The solution I'm looking for should work for any number of columns, not just three
Upvotes: 3
Views: 1839
Reputation: 12937
Here is my solution, clear and concise, and works for any number of columns:
n=ncol(df1)
combb=combn(n,2)
combb=cbind(combb, sapply(1:n, function(i) rep(i,2)))
res=apply(df1, 1, function(x) { apply(combb, 2, function(y) prod(x[y])) })
t(res)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0.17697473 0.02748285 0.056820059 0.08559952 0.365890531 0.008823729
# [2,] 0.08337501 0.12419698 0.204739766 0.05057603 0.137444401 0.304984209
# [3,] 0.47301970 0.51068123 0.487089495 0.49592997 0.451167798 0.525871254
# [4,] 0.34920860 0.07156869 0.092820832 0.26925425 0.452905189 0.019023202
# [5,] 0.21232357 0.14774167 0.071445132 0.43906475 0.102675746 0.049713853
# [6,] 0.83221898 0.63296215 0.621757189 0.84721676 0.817486693 0.472890881
# [7,] 0.05542008 0.02139673 0.015153719 0.07825199 0.039249934 0.005850588
# [8,] 0.03376319 0.45808619 0.026509902 0.58342170 0.001953909 0.359676293
# [9,] 0.40155468 0.50514566 0.315655035 0.64261164 0.250923183 0.397086073
# [10,] 0.03535148 0.01187911 0.006472142 0.06488487 0.019260683 0.002174826
Upvotes: 0
Reputation: 887108
Here is another option with combn
where do the combination of column names taking two at a time, multiply the columns after subsetting and cbind
with square of the original dataset.
res <- cbind(df1^2, do.call(cbind,combn(colnames(df1), 2,
FUN= function(x) list(df1[x[1]]*df1[x[2]]))))
colnames(res)[-(seq_len(ncol(df1)))] <- combn(colnames(df1), 2,
FUN = paste, collapse=":")
res
# a b c a:b a:c b:c
#1 0.08559952 0.365890531 0.008823729 0.17697473 0.02748285 0.056820059
#2 0.05057603 0.137444401 0.304984209 0.08337501 0.12419698 0.204739766
#3 0.49592997 0.451167798 0.525871254 0.47301970 0.51068123 0.487089495
#4 0.26925425 0.452905189 0.019023202 0.34920860 0.07156869 0.092820832
#5 0.43906475 0.102675746 0.049713853 0.21232357 0.14774167 0.071445132
#6 0.84721676 0.817486693 0.472890881 0.83221898 0.63296215 0.621757189
#7 0.07825199 0.039249934 0.005850588 0.05542008 0.02139673 0.015153719
#8 0.58342170 0.001953909 0.359676293 0.03376319 0.45808619 0.026509902
#9 0.64261164 0.250923183 0.397086073 0.40155468 0.50514566 0.315655035
#10 0.06488487 0.019260683 0.002174826 0.03535148 0.01187911 0.006472142
Upvotes: 4
Reputation: 73285
Let df
be your data frame, try this:
formula <- ~ I(a^2) + I(b^2) + I(c^2) + a:b + a:c + b:c - 1
X <- model.matrix(formula, df)
Use -1
to drop intercept, i.e., all 1 column. Use I()
to protect a^2
.
It does not really matter whether you have 3-way interaction; model.matrix()
can handle it pretty easily.
For you example data frame, you can get something like:
> X
I(a^2) I(b^2) I(c^2) a:b a:c b:c
1 0.02830988 0.290128663 0.8060044 0.09062841 0.15105592 0.48357521
2 0.78597627 0.451852115 0.1003373 0.59594047 0.28082514 0.21292636
3 0.36190629 0.117679147 0.5325122 0.20637060 0.43899829 0.25033093
4 0.83645938 0.006638227 0.9812959 0.07451582 0.90598796 0.08070976
5 0.50038157 0.197485843 0.6194279 0.31435374 0.55673179 0.34975454
6 0.25813071 0.567147970 0.5028665 0.38262032 0.36028502 0.53404096
7 0.51074360 0.219564943 0.1966824 0.33487518 0.31694526 0.20780897
8 0.37611759 0.752857721 0.3169607 0.53213065 0.34527451 0.48849390
9 0.00562814 0.627098114 0.8408894 0.05940872 0.06879421 0.72616812
10 0.78306385 0.405336110 0.3063323 0.56338624 0.48977313 0.35237413
attr(,"assign")
[1] 1 2 3 4 5 6
I did not set seed, so the numbers may be different when you test.
Model matrix is useful for constructing model matrix in regression analysis. In you case you only numerical data; in fact, you can also have factor-numeric interaction and factor-factor interaction.
Upvotes: 3