Reputation: 15
I am trying to do k-fold cross validation for Kernel Density Estimation (KDE), which requires me to take the integral of a squared density approximation. This is my current setup
# Read in crime data from csv file and select only latitude, longitude columns
crimeData <- read.csv("Crimes-Map.csv")
crimeData <- crimeData[, 15:16]
# Remove NAs from the data
cleanData <- crimeData[complete.cases(crimeData), ]
lat <- cleanData[[1]]
lon <- cleanData[[2]]
## Extra code to split my data into k sets ##
hSeq <- seq(0.001, 0.05, by=0.001)
JHLat <- numeric(length(hSeq))
for (i in 1:length(hSeq)) {
h <- hSeq[i]
## Lots of code that's running fine ##
# Get density estimate for full data to compute J(h)
latDensity <- density(lat, bw = h)
latDensFunc <- approxfun(latDensity)
latSq <- function(x) latDensFunc(x)^2
# Get support of this density so we integrate over the right region
loLat <- min(latDensity$x)
hiLat <- max(latDensity$x)
JHLat[i] <- integrate(latSq, lower = loLat, upper = hiLat)$value # + other term coming from rest of code
}
Everything works fine up to the integrate function, where I get the following error:
Error in integrate(lonSq, lower = loLon, upper = hiLon) : non-finite function value
I've plotted latSq and also checked that there are no infinite or NaN's over the support of the density using the following commands
xs <- seq(loLat, hiLat, by = 1e-4)
any(is.infinte(latSq(xs))
any(is.nan(latSq(xs))
And both return false. For reference, loLat is 36.61645 and hiLat is 42.02557. I know that latDensity has a maximum value of 29.979 so latSq should have a max value of 898.7404 which I'm pretty sure isn't big enough to cause an issue. So I'm very confused as to what is happening and how I can fix it - any help would be much appreciated.
Upvotes: 0
Views: 178
Reputation: 4233
You should be checking interpolated values of your function, not the integration domain. One of the values must be NaN
, and I can't provide more information because there is no data.
I can only guess what is going on. Since you are using approxfux
, the interpolated function (latDensFunc
) returns NaN
for one of the points. This is what leads to NaN
s in lonSq
.
Upvotes: 1