Reputation: 909
I am attempting to predict tree species composition using Sentinel 2A imagery and forest plot data. I have calculated the proportion of basal area (the cross-sectional area trees of a given species occupy within the plot divided by the total cross-sectional area all tree occupy with in a plot) from the forest plot data and want to predict proportion of basal area by species across the landscape as a raster using the intensity values from Sentinel 2A. Dirichlet regression seems appropriate here, because I have more than 2 categories of proportional data bounded on (0,1) where the summed proportions for each observational unit must be equal to 1. Therefore, I want to model the joint composition of proportional basal area as a function of spectral reflectance intensity using Dirichlet regression, with k-folds crossvalidation using 10 folds and 5 repeats. This seems like a modeling exercise perfectly suited to caret::train()
using a custom function.
I followed the documentation for how to build a custom routine for DirichletReg::DirichReg()
, but I am a bit stumped on how to feed a multivariate response variable to caret::train()
. The DirichletReg::DirichReg()
function requires that response variable be formatted with the DirichletReg::DR_data()
function first prior to modeling. However, when I feed a Dr_data
object to caret::train()
, I get this error:
Error: wrong model type for classification
I think this error is getting thrown because I have passed a multivariate response variable to the y=
argument of caret::train()
and the routine thinks that can only happen with classification algorithms. Does anyone have experience doing multivariate regression? Is there another way to change my custom routine so that caret::train()
will accept multivariate response variables?
Below is my reproducible example:
##Loading Necessary Packages##
library(caret)
library(DirichletReg)
##Creating Fake Data##
set.seed(88)#For reproducibility
#Response variables#
PSME_BA<-rnorm(25,50, 15)
TSHE_BA<-rnorm(25,40,12)
ALRU2_BA<-rnorm(25,20,0.5)
Total_BA<-PSME_BA+TSHE_BA+ALRU2_BA
#Predictor variables#
B1<-runif(25, 0, 2000)
B2<-runif(25, 0, 1800)
B3<-runif(25, 0, 3000)
#Dataset for modeling#
df<-data.frame(PSME=PSME_BA/Total_BA, TSHE=TSHE_BA/Total_BA, ALRU2=ALRU2_BA/Total_BA,
B1=B1, B2=B2, B3=B3)
##Creating a Dirichlet regression modeling routine to feed to caret::train()##
#Method list#
Dreg <- list(type='Regression',
library='DirichletReg',
loop=NULL)
#Parameters element#
prm <- data.frame(parameter=c("model"),
class="character")
Dreg$parameters <- prm
#Grid element#
DregGrid <- function(x, y, len=NULL, search="grid"){
if(search == "grid"){
out <- expand.grid(model=c("common", "alternative"), stringsAsFactors = F) # here force the strings as character,
# othewise get error that the model arguments
# were expecting 'chr' when fitting
}
out
}
Dreg$grid <- DregGrid
#Fit element#
DregFit <- function(x, y, param, ...){
dat <- if(is.data.frame(x)) x else as.data.frame(x)
dat$.outcome <- y
theDots <- list(...)
modelArgs <- c(list(formula = as.formula(".outcome ~ ."), data = dat, link=param$model, type=param$type), theDots)
out <- do.call(DirichletReg::DirichReg, modelArgs)
out$call <- NULL
out
}
Dreg$fit <- DregFit
#Predict element#
DregPred <- function(modelFit, newdata, preProc=NULL, submodels=NULL){
if(!is.data.frame(newdata)) newdata <- as.data.frame(newdata)
DirichletReg::predict.DirichletRegModel(modelFit, newdata)
}
Dreg$predict <- DregPred
#prob element#
DregProb <- function(){
return(NULL)
}
Dreg$prob <- DregProb
##Modeling the data using Dirichlet regression with repeated k-folds cross validation##
trCrtl<-trainControl(method="repeatedcv", number = 10, repeats = 5)
Y<-DR_data(df[,c(1:3)])#Converting the response variables to DR_data format
mod<-train(x=df[,-c(1:3)],Y, method=Dreg,trControl=trCrtl)#Throws error
Upvotes: 3
Views: 205