ahummels
ahummels

Reputation: 15

R integrate() function returns 'non-finite function value'

I am trying to do k-fold cross validation for Kernel Density Estimation (KDE), which requires me to take the integral of a squared density approximation. This is my current setup

# Read in crime data from csv file and select only latitude, longitude columns
crimeData <- read.csv("Crimes-Map.csv")
crimeData <- crimeData[, 15:16]
# Remove NAs from the data
cleanData <- crimeData[complete.cases(crimeData), ]
lat <- cleanData[[1]]
lon <- cleanData[[2]]

## Extra code to split my data into k sets ##

hSeq <- seq(0.001, 0.05, by=0.001)
JHLat <- numeric(length(hSeq))
for (i in 1:length(hSeq)) {
  h <- hSeq[i]

  ## Lots of code that's running fine ##

  # Get density estimate for full data to compute J(h)
  latDensity <- density(lat, bw = h)
  latDensFunc <- approxfun(latDensity)
  latSq <- function(x) latDensFunc(x)^2
  # Get support of this density so we integrate over the right region
  loLat <- min(latDensity$x)
  hiLat <- max(latDensity$x)
  JHLat[i] <- integrate(latSq, lower = loLat, upper = hiLat)$value # + other term coming from rest of code
}

Everything works fine up to the integrate function, where I get the following error:

Error in integrate(lonSq, lower = loLon, upper = hiLon) : non-finite function value

I've plotted latSq and also checked that there are no infinite or NaN's over the support of the density using the following commands

xs <- seq(loLat, hiLat, by = 1e-4)
any(is.infinte(latSq(xs))
any(is.nan(latSq(xs))

And both return false. For reference, loLat is 36.61645 and hiLat is 42.02557. I know that latDensity has a maximum value of 29.979 so latSq should have a max value of 898.7404 which I'm pretty sure isn't big enough to cause an issue. So I'm very confused as to what is happening and how I can fix it - any help would be much appreciated.

Upvotes: 0

Views: 178

Answers (1)

slava-kohut
slava-kohut

Reputation: 4233

You should be checking interpolated values of your function, not the integration domain. One of the values must be NaN, and I can't provide more information because there is no data.

I can only guess what is going on. Since you are using approxfux, the interpolated function (latDensFunc) returns NaN for one of the points. This is what leads to NaNs in lonSq.

Upvotes: 1

Related Questions