Julian Jaramillo
Julian Jaramillo

Reputation: 1

Calculating R squared from multiple columns

I'm very new to R and have been trying to figure out how to calculate R^2 from a few columns within a large data set of approx 300+ columns.

Example:

rcalc <- data.frame('x1' = c(694, 702, 701), 'x2'=c(652, 659, 655),
 'x3'=c(614, 612, 613), 'y1'= c(17.97, 17.95, 17.96), 'y2' = c(12.03, 12.0,
 12.1), 'y3' = c(0.09, 0.1, 0.1))

From here I am stuck.

The formula in excel I can do, and looks like this:

RSQ(X1:X3, Y1:Y3) or RSQ(694:652:614, 17.97:12.03:0.09)

So, each row needs to be calculated for R^2. I was able to use the 'lm' command but was only able to do this for 1 row: I had to take the value from each column of x (x1:x3) and stack them into 1 column, then each value from each column y (y1:y3) and stack into 1 column. Then performed the following:

rsqrd = lm(x~y, data=rcalc)
summary(rsqrd)$r.squared

This worked but again, only for 1 row. I'm not sure how to do this for thousands of rows. I hope this wasn't too confusing. Any help is greatly appreciated.

Troubleshooting:

with pivot_longer:

row col obs value
1   c   300_0 DUT Ip2_comp  784.9775
1   c   300_12 DUT Ip2_comp 864.4234
1   c   300_18 DUT Ip2_comp 919.3384
1   c   300_0 REF O2    0.09
1   c   300_12 REF O2   11.95
1   c   300_18 REF O2   17.98
2   c   300_0 DUT Ip2_comp  781.5785
2   c   300_12 DUT Ip2_comp 865.5541
2   c   300_18 DUT Ip2_comp 921.0646
2   c   300_0 REF O2    0.09

With Pivot_wider:

row obs c
1   300_0 DUT Ip2_comp  784.9775
1   300_12 DUT Ip2_comp 864.4234
1   300_18 DUT Ip2_comp 919.3384
1   300_0 REF O2    0.09
1   300_12 REF O2   11.95
1   300_18 REF O2   17.98
2   300_0 DUT Ip2_comp  781.5785
2   300_12 DUT Ip2_comp 865.5541
2   300_18 DUT Ip2_comp 921.0646

Upvotes: 0

Views: 1333

Answers (1)

Jon Spring
Jon Spring

Reputation: 66415

I'm sure this could be done more concisely, but here's one approach using tidyverse functions. First, I do some reshaping to add a row number and make it into a longer shape, with columns for row, observation # (1-3), x, and y.

Then I "nest" all the data except row number so that I can run a separate regression on each row's data, and then extract r squared (and a variety of other stats) from each regression.

library(tidyverse)


rcalc %>%  # your data

  # reshape to get matched columns for all x and for all y values
  mutate(row = row_number()) %>%
  pivot_longer(-row, names_to = c("col", "obs"), names_sep = 1) %>%  # split column name into two fields after first character
  pivot_wider(names_from = col, values_from = value) %>% 

  # nest data, regression, unnest
  nest(-row) %>%
  mutate(model = map(data, function(df) lm(y ~ x, data = df)),
         tidied = map(model, broom::glance)) %>%
  unnest(tidied)

Result

# A tibble: 3 x 15
    row data             model  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <int> <list>           <list>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1     1 <tibble [3 × 3]> <lm>       0.952         0.905  2.81      20.0   0.140     1  -5.71  17.4  14.7     7.91           1     3
2     2 <tibble [3 × 3]> <lm>       0.973         0.946  2.10      36.3   0.105     1  -4.84  15.7  13.0     4.43           1     3
3     3 <tibble [3 × 3]> <lm>       0.951         0.903  2.84      19.6   0.141     1  -5.74  17.5  14.8     8.05           1     3

Edit: for troubleshooting, I am adding here the results I see at each stage:

after the pivot_longer step:

# A tibble: 18 x 4
     row col   obs    value
   <int> <chr> <chr>  <dbl>
 1     1 x     1     694   
 2     1 x     2     652   
 3     1 x     3     614   
 4     1 y     1      18.0 
 5     1 y     2      12.0 
 6     1 y     3       0.09
 7     2 x     1     702   
 8     2 x     2     659   
 9     2 x     3     612   
10     2 y     1      18.0 
11     2 y     2      12   
12     2 y     3       0.1 
13     3 x     1     701   
14     3 x     2     655   
15     3 x     3     613   
16     3 y     1      18.0 
17     3 y     2      12.1 
18     3 y     3       0.1 

after the pivot_wider step:

# A tibble: 9 x 4
    row obs       x     y
  <int> <chr> <dbl> <dbl>
1     1 1       694 18.0 
2     1 2       652 12.0 
3     1 3       614  0.09
4     2 1       702 18.0 
5     2 2       659 12   
6     2 3       612  0.1 
7     3 1       701 18.0 
8     3 2       655 12.1 
9     3 3       613  0.1 

Upvotes: 0

Related Questions