Reputation: 127
The aim is to get the R2, RMSE, and MAE summaries for the cv.krige
with leave-one-out cross validation. To illustrate I have the following code (please note that a random dataset is created and that metrics are poor):
library(Metrics)
library(nlme)
library(Metrics)
library(gstat)
library(sf)
## == Create random dataframe == ##
#dependent variable
Lopend_gemiddelde <- c(10,20, 25, 30,40)
#predictors
var1 <- c(600, 350, 600, 200, 100)
var2 <- c(327892.21, 2930482.30, 349090.67, 400045.97, 506700.70)
var3 <- c(100, 300, 650, 820, 1000)
#geoinformation
x <- c(219208.7, 221541.6, 219057.7, 221119.3, 221227.1)
y <- c(5810867, 5809992, 5810245, 5808719, 5808392)
# Join the variables to create a data frame
df <- data.frame(Lopend_gemiddelde, var1, var2, var3, x, y)
df
#make spatial
df_spatial <- st_as_sf(df, crs=28992, coords = c("x", "y"))
## == Kriging - parameter setting == ##
linear_df= lm(Lopend_gemiddelde ~ 1 + var1 + var2 + var3, data=df_spatial)
summary(resid(linear_df))
#to dataframe (use df, rather than df_spatial, for separate geoinformation columns)
data_linear <- data.frame(x = df$x, y = df$y, resid=resid(linear_df))
#variogram - automatically
coordinates(data_linear) = ~x+y
variogram_auto = autofitVariogram(resid ~ 1, data_linear)
plot(variogram_auto)
print(variogram_auto$var_model)
#prints:
# model psill range
#1 Nug 4.437465 0.0000
#2 Sph 1.000000 350.6477
autofit_params <- variogram_auto$var_model #create dataset including variogram parameters
## == cv kriging and models evaluations ==##
#define variogram
m <- vgm(1.0000, "Sph", 350.6477)
# five-fold cross validation:
x_urb <- krige.cv(Lopend_gemiddelde ~ 1 + var1 + var2+ var3, df_spatial, m)
summary(x_urb)
x_urb$var1.pred
x_urb$observed
## == Mean Absolure Error == ##
MAE_urb = mae(x_urb$observed, x_urb$var1.pred)
print(MAE_urb)
summary(MAE_urb)
## == r2 == ##
#2. R SQUARED error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
cor(y_actual,y_predict)^2}
R2_model_urb = RSQUARE(x_urb$observed,x_urb$var1.pred)
print(R2_model_urb)
summary(R2_model_urb)
## == RMSE == ##
RMSE_model_urb = rmse(x_urb$observed, x_urb$var1.pred)
print(RMSE_model_urb)
summary(RMSE_model_urb)
However, when I print the model evaluation summaries (R2, RMSE, MAE), the summaries read:
> summary(MAE_urb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
95.18 95.18 95.18 95.18 95.18 95.18
> summary(R2_model_urb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
7.73e-07 7.73e-07 7.73e-07 7.73e-07 7.73e-07 7.73e-07
> summary(RMSE_model_urb)
Min. 1st Qu. Median Mean 3rd Qu. Max.
152.1 152.1 152.1 152.1 152.1 152.1
Since there are multiple folds (5), you would expect that there is variance in the output of the summaries. Therefore, the summaries are possibly only based on the last fold.
How do I get the summaries for all the folds, rather than the last fold, using the mae
, rmse
from the Metrics
library and the RSQUARE
function?
A similar post suggests using the automap
package for the rmse and mae, however I would like to stick to the rmse, and mae from the Metrics
package and the RSQUARE
function that I used in the above code.
Upvotes: 0
Views: 424
Reputation: 4121
You don't provide a reprex, so all one can do is guess. I think that your metrics are single value summaries; asking a summary then just summarises that single number, as in
> summary(1/3)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3333 0.3333 0.3333 0.3333 0.3333 0.3333
Upvotes: 0