Reputation: 370
I am computing the r-squared for multiple pairs of columns in a data frame. I can do this by individually writing out the code for each pair but I wanted to automate this using apply or some other vectorized approach based on the pattern of columns I am choosing from the data frame.
Sample data:
set.seed(1234)
dat <- data.frame(replicate(18,rnorm(10)))
To get the r-squared for column 1 v. 2:
fit <- lm(dat[,1] ~ dat[,2])
summary(fit)$r.squared
But I would like to do all of the following combinations: {1, 2}, {2, 3}, {3, 1}, {4, 5}, {5, 6}, {6, 4}... etc. through the 18th column.
In other words, all combinations of three with a window moving over to the next set of three each time. This way I can just call the function once on the whole data frame and get all the r-squared values at once instead of repeating the code 18 times.
Upvotes: 1
Views: 142
Reputation: 23788
You can try this:
v1 <- c(1:ncol(dat))
v2 <- v1 + c(1L, 1L, -2L)
m <- cbind(v1,v2)
fit <- lapply(1:length(dat),function(x) lm(dat[,m[x,1]]~dat[,m[x,2]]))
rsq <- sapply(1:length(dat), function(x) summary(fit[[x]])$r.squared)
Upvotes: 1
Reputation: 16121
An alternative process using dplyr
package:
set.seed(1234)
dat <- data.frame(replicate(18,rnorm(10)))
library(dplyr)
data.frame(colnames = names(dat)) %>% # get the names of columns
mutate(group = cumsum(ifelse(row_number() %in% seq(1,ncol(dat),3),1,0))) %>% # create group id based on 3 consecutive columns
group_by(group) %>% # for each group id
do({cb = combn(.$colnames,2) # create combinations of column names
data.frame(col1 = cb[1,],
col2 = cb[2,])}) %>%
mutate(formula = paste(col1,"~",col2)) %>% # create a formula for each combination
rowwise() %>% # for each row/formula
do(data.frame(formula = .$formula,
r.sq = summary(lm(.$formula, data=dat))$r.squared)) # create model and get r squared
# formula r.sq
# (chr) (dbl)
# 1 X1 ~ X2 3.072421e-02
# 2 X1 ~ X3 3.056746e-01
# 3 X2 ~ X3 7.708176e-02
# 4 X4 ~ X5 7.293980e-01
# 5 X4 ~ X6 3.244157e-01
# 6 X5 ~ X6 2.231886e-01
# 7 X7 ~ X8 6.637355e-03
# 8 X7 ~ X9 1.497414e-06
# 9 X8 ~ X9 9.758725e-02
# 10 X10 ~ X11 2.728225e-01
# 11 X10 ~ X12 5.973809e-02
# 12 X11 ~ X12 1.196112e-01
# 13 X13 ~ X14 5.541950e-02
# 14 X13 ~ X15 3.488573e-02
# 15 X14 ~ X15 2.519877e-02
# 16 X16 ~ X17 7.004510e-04
# 17 X16 ~ X18 8.827935e-02
# 18 X17 ~ X18 1.112862e-01
If you prefer you can replace mutate(group = cumsum(ifelse(row_number() %in% seq(1,ncol(dat),3),1,0)))
(create pairs based on a window of 3 consecutive columns) with
mutate(group = ntile(row_number(),6))
(create 6 groups of 3 consecutive columns).
Upvotes: 0
Reputation: 7248
If you just need the r-squared value, you can use the cor
function to give the correlation matrix. The r2 is just the square of the values in that matrix.
Upvotes: 0
Reputation: 71
Or in one line:
results <- sapply(1:ncol(dat), function(x) summary( lm( dat[ , x ] ~ dat[ ,ifelse( x%%3 != 0, x+1, x-2)]) )$r.squared )
Upvotes: 3
Reputation: 109
It should work:
results <- apply(combn(colnames(dat), 2), 2, function(x)summary(lm(dat[, x[1]] ~ dat[, x[2]]))$r.squared)
Upvotes: 0