Reputation: 637
I am trying to apply an existing model to a new data set. I try to explain it with an example. I am wondering what an elegant way to determine the goodness of the fit would look like.
Basically, I run a regression and obtain a model. With the summary function I obtain the usual output such as adjusted R-squared, p-value etc.
model.lm <- lm(Sepal.Length ~ Petal.Length, data = iris[1:75,])
summary(model.lm)
Now I want to run the predict function on new data and I am curious to know how the model performs on the new data.
pred.dat <- predict(model.lm, newdata = iris[76:150,])
I wanted to ask how I can for instance get an adjusted R-squared for the predicted values with the new data. For instance, is there something similar like the summary function? Ideally, I would like to find out what the best practice of obtaining the goodness of fit of a an existing model based on new data looks like.
Many thanks
Upvotes: 1
Views: 771
Reputation: 24198
You can translate the formula of R-squared
into a function, such as:
r_squared <- function(vals, preds) {
1 - (sum((vals - preds)^2) / sum((vals - mean(preds))^2))
}
# Test
> r_squared(iris[76:150,]$Sepal.Length, pred.dat)
#[1] 0.5675686
Building upon this function, and using the correct formula we can also define adjusted R-squared
as:
r_squared_a <- function(vals, preds, k) {
1 - ((1-r_squared(vals, preds))*(length(preds)-1))/(length(preds) - k - 1)
}
Where k
is the number of predictors, thus:
> r_squared_a(iris[76:150,]$Sepal.Length, pred.dat, 1)
#[1] 0.5616448
Upvotes: 4