David
David

Reputation: 15

r-squared by groups in linear regression

I have calculated a linear regression using all the elements of my dataset (24), and the resulting model is IP2. Now I want to know how well that single model fits (r-squared, I am not interested in the slope and intercept) for each country in my dataset. The awful way to do is (I would need to do the following 200 times)

Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)
summary(lm(IP[Country=="A"] ~ IP2[Country=="A"]))
summary(lm(IP[Country=="B"] ~ IP2[Country=="B"]))

Is there a way of calculating both r-squared at the same time? I tried with Linear Regression and group by in R as well as some others posts (Fitting several regression models with dplyr), but it did not work, and I get the same coefficients for the four groups I am working with. Any idea on what I am doing wrong or how to solve the problem? Thank you

Upvotes: 1

Views: 2008

Answers (2)

Mark
Mark

Reputation: 4537

You can use the split function and then mapply to accomplish this.

  • split takes a vector and turns it into a list with k elements where k is the distinct levels of (in this case) Country.
  • mapply allows us to loop over multiple inputs.
  • getR2 is a simple function that takes two inputs, fits a model and then extracts the R^2 value.

Code example below

Country <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B")
IP <- c(55,56,59,63,67,69,69,73,74,74,79,87,0,22,24,26,26,31,37,41,43,46,46,47)
IP2 <- c(46,47,49,50,53,55,53,57,60,57,58,63,0,19,20,21,22,25,26,28,29,30,31,31)

ip_split = split(IP,Country)
ip2_split = split(IP2,Country)

getR2 = function(ip,ip2){
  model = lm(ip~ip2)
  return(summary(model)$r.squared)
}

r2.values = mapply(getR2,ip_split,ip2_split)

r2.values
#>         A         B 
#> 0.9451881 0.9496636

Upvotes: 0

Julius Vainora
Julius Vainora

Reputation: 48251

A couple of options with base R:

sapply(unique(Country), function(cn)
  summary(lm(IP[Country == cn] ~ IP2[Country == cn]))$r.sq)
#         A         B 
# 0.9451881 0.9496636 

and

c(by(data.frame(IP, IP2), Country, function(x) summary(lm(x))$r.sq))
#         A         B 
# 0.9451881 0.9496636 

or

sapply(split(data.frame(IP, IP2), Country), function(x) summary(lm(x))$r.sq)
#         A         B 
# 0.9451881 0.9496636 

Upvotes: 1

Related Questions