Reputation: 390
Possibly this is a naive question but did not find a solution. I have a dataframe with count data from field survey and I want to predict species richness using poisson regression. The survey is allocated to grids of equal size but variable number of survey were done in each grid. So I wanted to include 'number of surveys per grid' as offset. The problem is when I want to predict the glm output using raster stack it wants a raster layer for the offset variable (number of surveys per grid). My question is how to incorporate that offset variable into raster stack so that I can produce a spatial prediction (i.e., prediction should be a raster file). Below is my reproducible effort (using fewer variable):
Create the dataframe:
bio2 <- c(12.74220, 14.10092, 13.82644, 14.30550, 15.02780, 14.88224, 13.98853, 14.89524, 15.59887, 13.98664, 14.75405,
15.38178, 14.50719, 15.00427, 12.77741, 13.25432, 12.91208, 15.75312, 15.36683, 13.33202, 12.55190, 14.94755,
13.52424, 14.75273, 14.42298, 15.37897, 12.02472, 15.49786, 14.28823, 13.01982, 13.60521, 15.07687, 14.17427,
13.24491, 14.84833, 13.52594, 13.92113, 11.39738, 14.31446, 12.10239)
bio9 <- c(26.30980, 26.52826, 27.03376, 23.93621, 26.48416, 26.05859, 25.37550, 25.34595, 25.34056, 23.37793, 25.74681,
22.72016, 22.00458, 24.37140, 22.95169, 24.52542, 24.63087, 22.86291, 23.10240, 23.79215, 24.86875, 21.40718,
23.84258, 21.91964, 25.97682, 24.97625, 22.31471, 19.64094, 23.93386, 25.87234, 25.99514, 17.17149, 20.72802,
18.22862, 24.51112, 24.33626, 23.90822, 23.43660, 23.07425, 20.71244)
count <- c(37, 144, 91, 69, 36, 32, 14, 34, 48, 168, 15, 21, 36, 29, 24, 16, 14, 11, 18, 64, 37, 31, 18, 9, 4,
16, 14, 10, 14, 43, 18, 88, 69, 26, 20, 5, 9, 75, 8, 26)
sitesPerGrid <- c(3, 16, 8, 5, 3, 3, 1, 3, 3, 29, 2, 4, 5, 2, 3, 4, 2, 1, 2, 9, 6, 3, 3, 2, 1, 2, 2, 1, 2, 5, 7, 15, 9, 4,
1, 1, 2, 22, 6, 5)
testdf <- data.frame(bio2, bio9, count, sitesPerGrid)
pois1 <- glm(count ~ bio2 + bio9, offset = log(sitesPerGrid), family = poisson (link = "log"), data = testdf)
Spatial prediction:
library(raster)
bio_2 <- bio_9 <- raster(nrow=5,ncol=8, xmn=0, xmx=1,ymn=0,ymx=1)
values(bio_2) <- bio2
values(bio_9) <- bio9
predRas <- stack(bio_2, bio_9)
names(predRas) <- c("bio2", "bio9")
pdPois <- raster::predict(predRas, pois1, type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = #object$xlevels) :
# variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 16 rows but variables found have 40 rows
I get error
because it expect a raster layer for sitesPerGrid
. But I don't want to use sitesPerGrid
as a predictor.
Based on the comment and answer given by @robertHijmans I have tried using the following code:
pdPois <- raster::predict(predRas, pois1, const = testdf[, "sitesPerGrid"], type = "response")
Again I get the following error:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 143811, 40
Upvotes: 1
Views: 345
Reputation: 390
The problem is solved using a raster for the offset variable. The raster is created based on a hypothesis. For example, I want to see the prediction if there is one site per grid, or mean(sitesPerGrid)
or max(sitesPerGrid)
. If my hypothesis is mean(sitesPerGrid)
then the raster for prediction would be:
# make new raster for sitesPerGrid
rasGrid <- bio2
rasGrid[,] <- mean(testdf$sitesPerGrid)
names(rasGrid) <- "sitesPerGrid"
predRas <- stack(bio_2, bio_9, rasGrid)
p <- raster::predict(predRas, pois1, type = "response")
Upvotes: 0
Reputation: 47536
I see that this works, because the number of data points is the same as what was used to fit the model
p <- predict(pois1, as.data.frame(predRas), type = "response")
However, this (taking two data points) does not work:
p <- predict(pois1, as.data.frame(predRas)[1:2,], type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
# variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 2 rows but variables found have 40 rows
So, irrespective of the raster data, can you (and if so how?) use a model like this to make predictions to (any number of) new data points?
Upvotes: 0