Foeke Boersma
Foeke Boersma

Reputation: 127

calculate R2, RMSE, and MAE for cv.krige (gstat) - R

The aim is to get the R2, RMSE, and MAE summaries for the cv.krige with leave-one-out cross validation. To illustrate I have the following code (please note that a random dataset is created and that metrics are poor):

library(Metrics)
library(nlme) 
library(Metrics)
library(gstat)  
library(sf)

## == Create random dataframe == ##

#dependent variable
Lopend_gemiddelde <- c(10,20, 25, 30,40)
#predictors
var1 <- c(600, 350, 600, 200, 100)
var2 <- c(327892.21, 2930482.30, 349090.67, 400045.97, 506700.70)
var3 <- c(100, 300, 650, 820, 1000)
#geoinformation
x <- c(219208.7, 221541.6, 219057.7, 221119.3, 221227.1)
y <- c(5810867, 5809992, 5810245, 5808719, 5808392)
# Join the variables to create a data frame
df <- data.frame(Lopend_gemiddelde, var1, var2, var3, x, y)
df

#make spatial
df_spatial <- st_as_sf(df, crs=28992, coords = c("x", "y"))

## == Kriging - parameter setting == ##

linear_df= lm(Lopend_gemiddelde ~ 1 + var1 + var2 + var3, data=df_spatial)
summary(resid(linear_df))

#to dataframe (use df, rather than df_spatial, for separate geoinformation columns)
data_linear <- data.frame(x = df$x, y = df$y, resid=resid(linear_df))

#variogram - automatically
coordinates(data_linear) = ~x+y

variogram_auto = autofitVariogram(resid ~ 1, data_linear)
plot(variogram_auto)
print(variogram_auto$var_model)

#prints:
#  model    psill    range
#1   Nug 4.437465   0.0000
#2   Sph 1.000000 350.6477

autofit_params <- variogram_auto$var_model #create dataset including variogram parameters


## == cv kriging and models evaluations ==##

#define variogram
m <- vgm(1.0000, "Sph", 350.6477)
# five-fold cross validation:
x_urb <- krige.cv(Lopend_gemiddelde ~ 1 + var1 + var2+ var3, df_spatial, m)
summary(x_urb)
x_urb$var1.pred
x_urb$observed

## == Mean Absolure Error == ##
MAE_urb = mae(x_urb$observed, x_urb$var1.pred)
print(MAE_urb)
summary(MAE_urb)

## == r2 == ##

#2. R SQUARED error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
  cor(y_actual,y_predict)^2}

R2_model_urb = RSQUARE(x_urb$observed,x_urb$var1.pred)
print(R2_model_urb)
summary(R2_model_urb)

## == RMSE == ##
RMSE_model_urb = rmse(x_urb$observed, x_urb$var1.pred)
print(RMSE_model_urb)
summary(RMSE_model_urb)

However, when I print the model evaluation summaries (R2, RMSE, MAE), the summaries read:

> summary(MAE_urb)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  95.18   95.18   95.18   95.18   95.18   95.18 
> summary(R2_model_urb)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
7.73e-07 7.73e-07 7.73e-07 7.73e-07 7.73e-07 7.73e-07
> summary(RMSE_model_urb)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  152.1   152.1   152.1   152.1   152.1   152.1 

Since there are multiple folds (5), you would expect that there is variance in the output of the summaries. Therefore, the summaries are possibly only based on the last fold.

How do I get the summaries for all the folds, rather than the last fold, using the mae, rmse from the Metrics library and the RSQUARE function?

A similar post suggests using the automap package for the rmse and mae, however I would like to stick to the rmse, and mae from the Metrics package and the RSQUARE function that I used in the above code.

Upvotes: 0

Views: 424

Answers (1)

Edzer Pebesma
Edzer Pebesma

Reputation: 4121

You don't provide a reprex, so all one can do is guess. I think that your metrics are single value summaries; asking a summary then just summarises that single number, as in

> summary(1/3)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.3333  0.3333  0.3333  0.3333  0.3333  0.3333 

Upvotes: 0

Related Questions