sourmn
sourmn

Reputation: 11

Error in .local(x, p, ...) : more than half of the presence points have NA predictor values

So I am making an ecological distribution map and I have been getting an error code. I am pretty new to R so any suggestions would be helpful I have tried to look around the web but I haven't made much success.

here is the code that I am working with

setwd("~/gfds/Salsola Files")
#loading in the needed library
library(dismo)
library(maptools)
library(raster)
library(sp)
library(rgeos)
library(caret)

#---misc variables manually defined-----------------------------------------------------------------

e <- extent(-180,-60,20,70) #Set extent of area (long_min, long_max, lat_min, lat_max)
n_abs <- 1000 

datafile <- "Salsola Tragus Data.csv"

data <- read.csv(datafile,sep = ",", quote = "\"",dec = ".", fill = TRUE, comment.char = "")

data <- data[,c("Scientific.Names","Latitude","Longitude")]

salsola.coords<-data[,-1]

data.sdf<-data              #spacial data frame for salsola data

coordinates(data.sdf) <- ~Longitude+Latitude

projection(data.sdf) <- CRS('+proj=longlat +datum=WGS84')

#---Load worldclim data
#Load bioclim stack--------------------------------------------------------------------------------------------

worldclim <-stack(getData('worldclim', var='bio', res=10))
#resolution 10 looks high enough for us
#Crop worldclim to area of interest 

worldclim.extent <-stack(crop(worldclim,e))   #crop worldclim to desired extent "e"

#---Plot points on a map to confirm outliers have been successfully removed-------------------------------------------------------------------
data(wrld_simpl)

#Plot worldwide distribution to catch outliers
windows() #open in new window

plot(wrld_simpl, xlim=c(-180,180), ylim=c(-80,80), axes=TRUE, col="light yellow")#plot map

box() #add box around plot 

points(salsola.coords$Longitude, salsola.coords$Latitude, col='orange', pch=20, cex=0.75) #plot salsola points

points(salsola.coords$Longitude, salsola.coords$Latitude, col='red', cex=0.75) #add border

#---pseudo absence simulation------------------------------------------------------------------------------------------------------------------
#Define a mask from worldclim data
mask_land <- raster(worldclim.extent@layers[[1]])

# Create circles with a radius of 100 km
x <- circles(data.sdf, d=100000, lonlat=TRUE)
pol <- polygons(x)

# sample randomly from all circles-----------------------------------------------------------------------------------------------------------------------
samp1 <- spsample(pol, 1000, type='random', iter=25)
# Warning in proj4string(obj): CRS object has comment, which is lost in output
# get unique cells
cells <- cellFromXY(mask_land, samp1)
length(cells)
cells <- unique(cells)
length(cells)
xy <- xyFromCell(mask_land, cells)

#Plot the rando data we've sampled above on top of the circles.----------------------------------------------------------------------------------
#Our points don't look like they're in such a nice grid, because we're zoomed 
#out much further than the demo and don't have enough points in the plot. 
#I've upped the number of points to 1000, but you might increase later after debug.

dev.new()

plot(pol, axes=TRUE)

points(xy, cex=0.1, pch=20, col='blue')

#---Choosing a subset of abiotic rasters-------------------------------------------------------------------------------------------------------------
#Correlate abiotic layers to each other
WCcorr <- layerStats(worldclim.extent, 'pearson', na.rm=TRUE)

WCcorr <- WCcorr$`pearson correlation coefficient`
#write.csv(WCcorr, "correlationBioclim.csv") 

#Removing highly correlated layers---------------------------------------------------------------------------------------------------------------------
#c <- data.matrix(read.csv("correlationBioclim.csv", header = TRUE, row.names = 1,sep = ","))
WCcorr <- abs(WCcorr)

# Find rows that should be removed
kill_list <- findCorrelation(WCcorr, cutoff = 0.7, names = TRUE, exact = TRUE)

kill_list<-sort(kill_list)
# keep: bio2, bio3, bio5, bio8, bio9, bio12,  bio14, and bio18 
#Define Subset layers in worldclim

worldclim.subset<-dropLayer(worldclim.extent,kill_list) 

worldclim.subset.names<-names(worldclim.subset)

print(worldclim.subset.names)

#---Construct dataframe of presences and absences-------------------------------------------------------------------------------------------------------

presvals <- extract(worldclim.subset, salsola.coords)

# setting random seed to always create the same
# random set of points for this example

set.seed(0)
stvals <- extract(worldclim.subset, randomPoints(worldclim.subset, n_abs)) #simulated absence values

pb <- c(rep(1, nrow(presvals)), rep(0, nrow(stvals)))#presence & absence binary array

sdmdata <- data.frame(cbind(pb, rbind(presvals, stvals)))#dataframe of presence and absence data

#Plot using "pairs" function--------------------------------------------------------------------------------------------------------------------------------
pairs(sdmdata[,2:6], cex=0.1)

#Save outputs to be used in next section
saveRDS(sdmdata, "s_traguasdm.Rds")
saveRDS(presvals, "s_tragus_pvals.Rds")

#---Model Fitting
#GLM: Linear Models

m2 = glm(pb ~ ., data=sdmdata)

#MaxEnt model
#get maxent() #Run this prior to attempting to load dismo

#Create training data for maxent------------------------------------------------------------------------------------------------------------------------------
group<-kfold(salsola.coords,5)

pres_test <- salsola.coords[group==1, ]

pres_train <-salsola.coords[group !=1,]

#Run MaxEnt: I am not totally sure how "factors" works, ------------------------------------------------------------------------------------------------------
#but it tells MaxEnt which layers we want to consider "categorical". I've set
#this series of model creation steps up to try each of our chosen variables, 
#as well as one which doesn't specify a factor (max).

#FYI, MaxEnt has a "removeDuplicates" Boolean option which will remove points
#which fall in the same grid cell. That might be really useful if your abiotic
#rasters have varying resolution.
max <- maxent(worldclim.subset,salsola.coords)#####THIS SHOULD BE PRES_TRAIN#### 

the error i am getting:

Error in .local(x, p, ...) : 
  more than half of the presence points have NA predictor values

any help would be greatly appreciated

Upvotes: 1

Views: 803

Answers (1)

Robert Hijmans
Robert Hijmans

Reputation: 47391

The message "more than half of the presence points have NA predictor values" suggests that many of your points do not overlap with the raster data, or if they do, that there the cells are NA there.

I think the reason for this is that you use salsola.coords which you define as

data <- data[,c("Scientific.Names","Latitude","Longitude")]
salsola.coords<-data[,-1]

Whereas you probably should do

salsola.coords <- data[, c("Longitude", "Latitude")]

That is, have longitude and latitude in the correct order

Upvotes: 2

Related Questions