Reputation: 7
I have a data frame consisting of 46 variables, and what I would like to do is making subsets per all possible combinations of 2 variables.
For example, if I had a data frame consisting of 3 variables "A", "B", "C", making 3 subsets with variables A and B, A and C, B and C would be my goal.
I would like to assign each of those subsets as covariates of a regression model so that I can try all the combination of 2 variables as covariates.
All I can think of is using loop, but I would appreciate it if anyone could teach me how to do it!
Upvotes: 0
Views: 528
Reputation: 6307
Following on from the comments, you can do this with nested loops.
This will loop the data and print out pairs without any duplicates:
#your data
char_vec <- c("A", "B", "C", "D")
#values to track the outer loop
i = 1
#use -1 to the length because we cant make a pair from only the single last value
while(i <= length(char_vec)-1){
#value to track the inner loop
#start at i+1 to make sure that we don't repeat data
j = i+1
while(j <= length(char_vec)){
#print your data or do whatever you need with it
#using sep="" will remove the space from joining the values using the paste command
print(paste(char_vec[i],char_vec[j],sep=""))
#increase for the next loop
j <- j + 1
}
#increase for the next loop
i <- i + 1
}
And the output looks like this:
[1] "AB"
[1] "AC"
[1] "AD"
[1] "BC"
[1] "BD"
[1] "CD"
Upvotes: 0
Reputation: 41240
combn
could help preparing the list of combinations :
apply(combn(c("A","B","C"),2),2,function(x) as.formula(paste0("y~",x[1],'+',x[2])))
[[1]]
y ~ A + B
<environment: 0x0000027286e851c8>
[[2]]
y ~ A + C
<environment: 0x000002728897a380>
[[3]]
y ~ B + C
<environment: 0x000002728692adc0>
You could then use lapply
to evaluate the different formulas.
For example with mtcars
:
variables <- setdiff(colnames(mtcars),"cyl")
cbn <- apply(combn(variables,2),2,function(x) as.formula(paste0("cyl~",x[1],'+',x[2])))
lapply(cbn,function(x) {summary(eval(substitute(lm(y,mtcars),list(y=x))))})
#> [[1]]
#>
#> Call:
#> lm(formula = cyl ~ mpg + disp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.3002 -0.6138 0.1776 0.5486 1.1406
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 5.917863 1.255293 4.714 5.61e-05 ***
#> mpg -0.092206 0.041352 -2.230 0.0337 *
#> disp 0.009198 0.002011 4.574 8.27e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.7364 on 29 degrees of freedom
#> Multiple R-squared: 0.8409, Adjusted R-squared: 0.83
#> F-statistic: 76.66 on 2 and 29 DF, p-value: 2.647e-12
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = cyl ~ mpg + hp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.5641 -0.4721 -0.1099 0.6273 1.3585
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 7.629183 1.226285 6.221 8.69e-07 ***
#> mpg -0.153574 0.039052 -3.933 0.00048 ***
#> hp 0.011205 0.003433 3.264 0.00281 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.8263 on 29 degrees of freedom
#> Multiple R-squared: 0.7998, Adjusted R-squared: 0.7859
#> F-statistic: 57.91 on 2 and 29 DF, p-value: 7.459e-11
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = cyl ~ mpg + drat, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.8180 -0.4772 0.2271 0.6694 1.3862
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 13.03441 1.15565 11.279 4.02e-12 ***
#> mpg -0.20753 0.03737 -5.554 5.45e-06 ***
#> drat -0.74449 0.42121 -1.767 0.0877 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.918 on 29 degrees of freedom
#> Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
#> F-statistic: 44.16 on 2 and 29 DF, p-value: 1.581e-09
#>
Upvotes: 1