sheß
sheß

Reputation: 525

Efficient ways to multiply all columns in data frame with each other

Assuming I have a dataframe consisting of three columns

set.seed(24)
df1 <- data.frame(a=runif(10),b=runif(10),c=runif(10))

And want to have one with six columns of all interactions:

a*a, a*b, a*c, b*c, b*b, c*c

The solution I'm looking for should work for any number of columns, not just three

Upvotes: 3

Views: 1839

Answers (3)

989
989

Reputation: 12937

Here is my solution, clear and concise, and works for any number of columns:

n=ncol(df1)
combb=combn(n,2)
combb=cbind(combb, sapply(1:n, function(i) rep(i,2)))
res=apply(df1, 1, function(x) { apply(combb, 2, function(y) prod(x[y])) })
t(res)

          # [,1]       [,2]        [,3]       [,4]        [,5]        [,6]
 # [1,] 0.17697473 0.02748285 0.056820059 0.08559952 0.365890531 0.008823729
 # [2,] 0.08337501 0.12419698 0.204739766 0.05057603 0.137444401 0.304984209
 # [3,] 0.47301970 0.51068123 0.487089495 0.49592997 0.451167798 0.525871254
 # [4,] 0.34920860 0.07156869 0.092820832 0.26925425 0.452905189 0.019023202
 # [5,] 0.21232357 0.14774167 0.071445132 0.43906475 0.102675746 0.049713853
 # [6,] 0.83221898 0.63296215 0.621757189 0.84721676 0.817486693 0.472890881
 # [7,] 0.05542008 0.02139673 0.015153719 0.07825199 0.039249934 0.005850588
 # [8,] 0.03376319 0.45808619 0.026509902 0.58342170 0.001953909 0.359676293
 # [9,] 0.40155468 0.50514566 0.315655035 0.64261164 0.250923183 0.397086073
# [10,] 0.03535148 0.01187911 0.006472142 0.06488487 0.019260683 0.002174826

Upvotes: 0

akrun
akrun

Reputation: 887108

Here is another option with combn where do the combination of column names taking two at a time, multiply the columns after subsetting and cbind with square of the original dataset.

res <- cbind(df1^2, do.call(cbind,combn(colnames(df1), 2, 
               FUN= function(x) list(df1[x[1]]*df1[x[2]]))))
colnames(res)[-(seq_len(ncol(df1)))] <-  combn(colnames(df1), 2, 
                 FUN = paste, collapse=":")
res
#            a           b           c        a:b        a:c         b:c
#1  0.08559952 0.365890531 0.008823729 0.17697473 0.02748285 0.056820059
#2  0.05057603 0.137444401 0.304984209 0.08337501 0.12419698 0.204739766
#3  0.49592997 0.451167798 0.525871254 0.47301970 0.51068123 0.487089495
#4  0.26925425 0.452905189 0.019023202 0.34920860 0.07156869 0.092820832
#5  0.43906475 0.102675746 0.049713853 0.21232357 0.14774167 0.071445132
#6  0.84721676 0.817486693 0.472890881 0.83221898 0.63296215 0.621757189
#7  0.07825199 0.039249934 0.005850588 0.05542008 0.02139673 0.015153719
#8  0.58342170 0.001953909 0.359676293 0.03376319 0.45808619 0.026509902
#9  0.64261164 0.250923183 0.397086073 0.40155468 0.50514566 0.315655035
#10 0.06488487 0.019260683 0.002174826 0.03535148 0.01187911 0.006472142

Upvotes: 4

Zheyuan Li
Zheyuan Li

Reputation: 73285

Let df be your data frame, try this:

formula <- ~ I(a^2) + I(b^2) + I(c^2) + a:b + a:c + b:c - 1
X <- model.matrix(formula, df)

Use -1 to drop intercept, i.e., all 1 column. Use I() to protect a^2.

It does not really matter whether you have 3-way interaction; model.matrix() can handle it pretty easily.

For you example data frame, you can get something like:

> X
       I(a^2)      I(b^2)    I(c^2)        a:b        a:c        b:c
1  0.02830988 0.290128663 0.8060044 0.09062841 0.15105592 0.48357521
2  0.78597627 0.451852115 0.1003373 0.59594047 0.28082514 0.21292636
3  0.36190629 0.117679147 0.5325122 0.20637060 0.43899829 0.25033093
4  0.83645938 0.006638227 0.9812959 0.07451582 0.90598796 0.08070976
5  0.50038157 0.197485843 0.6194279 0.31435374 0.55673179 0.34975454
6  0.25813071 0.567147970 0.5028665 0.38262032 0.36028502 0.53404096
7  0.51074360 0.219564943 0.1966824 0.33487518 0.31694526 0.20780897
8  0.37611759 0.752857721 0.3169607 0.53213065 0.34527451 0.48849390
9  0.00562814 0.627098114 0.8408894 0.05940872 0.06879421 0.72616812
10 0.78306385 0.405336110 0.3063323 0.56338624 0.48977313 0.35237413
attr(,"assign")
[1] 1 2 3 4 5 6

I did not set seed, so the numbers may be different when you test.

Model matrix is useful for constructing model matrix in regression analysis. In you case you only numerical data; in fact, you can also have factor-numeric interaction and factor-factor interaction.

Upvotes: 3

Related Questions