User User
User User

Reputation: 43

Linear regression for each cell in a table

I have four tables. Each of them got 4 rows and 4 columns. Followings are the four tables.

For the 1st table,

t1 <- array(1:20, dim=c(4,4))


    [,1] [,2] [,3] [,4] 
[1,]    1    5    9   13 
[2,]    2    6   10   14 
[3,]    3    7   11   15 
[4,]    4    8   12   16

For the 2nd table,

t2 <- array(6:25, dim=c(4,4))

     [,1] [,2] [,3] [,4]
[1,]    6   10   14   18
[2,]    7   11   15   19
[3,]    8   12   16   20
[4,]    9   13   17   21

For the 3rd table,

t3 <- array(11:30, dim=c(4,4))

     [,1] [,2] [,3] [,4]
[1,]   11   15   19   23
[2,]   12   16   20   24
[3,]   13   17   21   25
[4,]   14   18   22   26

For the 4th table,

t4 <- array(21:30, dim=c(4,4))

     [,1] [,2] [,3] [,4]
[1,]   21   25   29   23
[2,]   22   26   30   24
[3,]   23   27   21   25
[4,]   24   28   22   26

For each tables, I got a fixed set of y-value.

t1 = 0.1 
t2 = 3
t4 = 0.5
t6 = 7

In other words:

y <- c( 0.1, 3, 0.75, 7)

Then, I want to extract x values from each of the cell in the four tables. That is for the [1,1] cell, the x-values extacted should be (0.1, 3, 0.5, 7). We repeats this step one by one till the end of the table, i.e. the [4,4] cell. Thus, I got a total of 16 sets of x-values as folows:

cell   x-values
[1,1]  (1,6,11,21) 
[1,2]  (5,10,15,25) 
…..
[4,4]  (16, 21,26,26)

Then I try to calculate the R2 for linear regression for each y-x pairs. In other word, I want to got a total of 16 R2 values as follows:

For [1,1] cell, linear regression between (0.1, 3, 0.5, 7) and (1,6,11,21) = 0.6853
For [1,2] cell, linear regression between (0.1, 3, 0.5, 7) and (5,10,15,25) = 0.6853 
…..
For [4,4] cell, linear regression between (0.1, 3, 0.5, 7) and (16, 21,26,26) = 0.2719 

Finally, I want to get a table with the following two columns

cell   R2 
[1,1] 0.6853
[1,2] 0.6853
….
[4,4] 0.2719

I learnt that to do linear regression for x and y series of data, I can use following command:

Rcoefficient <- summary(lm(y ~ x, data=faithful))$r.squared

However, I have trouble readin each set of x-values from the four tables. I tried to use reshape, but I still cannot get it right. Could experts in Stackoverflow, help to suggest an efficient way to do it with R, as my real tables are very large with over 1000 columns and rows.

Thanks a lot.

Upvotes: 2

Views: 401

Answers (2)

January
January

Reputation: 17090

First, let's reformat the data.

EDIT: this code is less then optimal, see Gavins solution in the other answer.

t <- NULL
for( row in 1:nrow( t1 ) ) {
  for( col in 1:ncol( t1 ) ) {
    t <- rbind( t, c( t1[ row, col ], t2[ row, col ], t3[ row, col ], t4[ row, col ] ) )
   }
 }

This will produce a matrix with four columns (one for each table), and nrow * ncol rows - as many rows as you have cells in one table. Check it with dim( t ). Running regression is now easy:

apply( t, 1, function( x ) { summary( lm( y ~ x ) )$r.squared )

Upvotes: -2

Gavin Simpson
Gavin Simpson

Reputation: 174813

I would[*] manipulate the arrays in place, by concatenating them into a 4 x 4 x 4 array:

t1 <- array(1:20, dim=c(4,4))
t2 <- array(6:25, dim=c(4,4))
t3 <- array(11:30, dim=c(4,4))
t4 <- array(21:30, dim=c(4,4))

tt <- array(c(t1,t2,t3,t4), dim = c(4,4,4))
## now you can remove the original arrays

which gives:

> tt
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

, , 2

     [,1] [,2] [,3] [,4]
[1,]    6   10   14   18
[2,]    7   11   15   19
[3,]    8   12   16   20
[4,]    9   13   17   21

, , 3

     [,1] [,2] [,3] [,4]
[1,]   11   15   19   23
[2,]   12   16   20   24
[3,]   13   17   21   25
[4,]   14   18   22   26

, , 4

     [,1] [,2] [,3] [,4]
[1,]   21   25   29   23
[2,]   22   26   30   24
[3,]   23   27   21   25
[4,]   24   28   22   26

Then we use aperm() to rearrange the dimensions of the array so that the indices you requested are in the right order. We create a matrix from this array as a final step.

X <- matrix(aperm(tt, c(3,1,2)), ncol = 4, byrow = TRUE)

The aperm(tt, c(3,1,2)) step produces

> aperm(tt, c(3,1,2))
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    6    7    8    9
[3,]   11   12   13   14
[4,]   21   22   23   24

, , 2

     [,1] [,2] [,3] [,4]
[1,]    5    6    7    8
[2,]   10   11   12   13
[3,]   15   16   17   18
[4,]   25   26   27   28

, , 3

     [,1] [,2] [,3] [,4]
[1,]    9   10   11   12
[2,]   14   15   16   17
[3,]   19   20   21   22
[4,]   29   30   21   22

, , 4

     [,1] [,2] [,3] [,4]
[1,]   13   14   15   16
[2,]   18   19   20   21
[3,]   23   24   25   26
[4,]   23   24   25   26

where the indices you want are in columns, which we exploit when creating the matrix as R will treat the permuted array as a vector filled from the columns of the permuted array. X results in

> X
      [,1] [,2] [,3] [,4]
 [1,]    1    6   11   21
 [2,]    2    7   12   22
 [3,]    3    8   13   23
 [4,]    4    9   14   24
 [5,]    5   10   15   25
 [6,]    6   11   16   26
 [7,]    7   12   17   27
 [8,]    8   13   18   28
 [9,]    9   14   19   29
[10,]   10   15   20   30
[11,]   11   16   21   21
[12,]   12   17   22   22
[13,]   13   18   23   23
[14,]   14   19   24   24
[15,]   15   20   25   25
[16,]   16   21   26   26

Then we can proceed as per @January's answer and fit the regression (though note I explicitly pass in y as the scoping rules of lm() are non-standard and I'm being defensive.)

y <- c( 0.1, 3, 0.75, 7)
r2 <- apply(X, 1, function(x, y) summary(lm(y ~ x))$r.squared, y = y)

This results in:

> head(r2)
[1] 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542 0.7160542

Note that there is an inconsistency in your text and code. You state the response is (0.1, 3, 0.5, 7) but define y as c( 0.1, 3, 0.75, 7). The results I show use the latter but your results used the former, hence the difference.

[*] without knowing more about the context I'm not sure I'd be wanting to fit millions of linear models...

Upvotes: 7

Related Questions