Reputation: 45
I am working with very low sample size (10-15) trying to create an ensemble SDM with the package TidySDM. Cross validation won't work with the random k fold method and in these cases of low sample size jackknife or leave one out cross validation is best practice.
TidySDM uses the spatial sample package and what i think is the jackknife function is "spatial_leave_location_out_cv". However it requires a "group" argument that i cannot figure out what to provide for this argument.
Right now I have a an sf dataframe with the following columns: Class - presence vs background locations. Geometry - the lat/lon location data (10010 rows in total,10 for the presences and 10000 for the background points). 7 predictor variables with the values extracted from the rasters
I have tried supplying the class and geometry columns for the group argument. The models failed to run with class as the argument. When i set geometry to the group argument, the models ran for about 8 hours and wasn't even half way through so I terminated the session.
What is the proper way to run this function for jackknife CV? Here is my code if it helps:
dive.cv <- spatial_leave_location_out_cv(data = dive.vars1, group = jackknife, v = NULL)
autoplot(dive.cv)```
Upvotes: 0
Views: 21
Reputation: 321
To run a remove-one jacknife with spatial_leave_location_out_cv
, you will need to set up a grouping variable, such that each group contains a single presence and an appropriate number of background plots. Here is a simple reprex, using the lacerta
dataset in tidysdm
, subsetted to just 3 presences and 2 background points per presence.
library(tidysdm)
#> Loading required package: tidymodels
#> Loading required package: spatialsample
lacerta_thin <- readRDS(system.file("extdata/lacerta_thin_all_vars.rds",
package = "tidysdm"))
########
# create a small dataset for the reprex
n_pres <- 3 # number of presences
n_bkg_per_pres <- 2 # number of background points per presence
set.seed(123)
lacerta_small <- rbind(lacerta_thin %>% filter(class == "presence") %>%
sample_n(size = n_pres),
lacerta_thin %>% filter(class == "background") %>%
sample_n(size=n_pres * n_bkg_per_pres))
########
# now create groups, 1 per presence, each with
# n_bkg_per_pres background points
lacerta_small$group <- NA
lacerta_small$group[lacerta_small$class == "presence"] <- 1:n_pres
lacerta_small$group[lacerta_small$class == "background"] <-
sample(rep(1:n_pres, each = n_bkg_per_pres), replace=FALSE)
########
# set up the folds for the jacknife
lacerta_cv <- spatial_leave_location_out_cv(data = lacerta_small,
group = group)
# confirm that we have the right balance of presence and background points
check_splits_balance(lacerta_cv, class)
#> # A tibble: 3 × 4
#> presence_assessment background_assessment presence_analysis
#> <int> <int> <int>
#> 1 2 4 1
#> 2 2 4 1
#> 3 2 4 1
#> # ℹ 1 more variable: background_analysis <int>
autoplot(lacerta_cv)
Created on 2025-02-28 with reprex v2.1.1
Upvotes: 0