David M. Kaplan
David M. Kaplan

Reputation: 807

How slow is too slow when kriging with gstat in R

I am trying to use the krige function in the gstat package of R to interpolate some spatial ocean depth data in R. I am finding for more than about ~1000 points, the function starts taking unreasonable amounts of time to finish (i.e., hours to days to hasn't ever finished). Is this normal or am I doing something wrong? I am particularly concerned because my eventual goal is to do spatio-temporal kriging of a very large dataset (>30,000 data points) and I am worried that it just won't be feasible given these run times.

I am running gstat-1.1-3 and R-3.3.2. Below is the code I am running:

library(sp); library(raster); library(gstat)
v.utm # SpatialPointsDataFrame with >30,000 points

# Remove points with identical positons
zd = zerodist(v.utm)
nzd = v.utm[-zd[,1],] # Layer with no identical positions

# Make a raster layer covering point layer
resolution=1e4
e = extent(as.matrix(v.utm@coords))+resolution
r = raster(e,resolution=resolution) 
proj4string(r) = proj4string(v.utm)

# r is a 181x157 raster

# Fit variogram
fv = fit.variogram(variogram(AVGDEPTH~1, nzd),model=vgm(6000,"Exp",1,5e5,1))

# Krige on random sample of 500 points - works fine
size=500
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)

# Krige on random sample of 5000 points - never seems to end
size=5000
ss=nzd[sample.int(nrow(nzd),size),]
depth.krig = krige(AVGDEPTH~1,ss,as(r,"SpatialPixelsDataFrame"),
               model=depth.fit)

Upvotes: 2

Views: 3345

Answers (2)

Barbara
Barbara

Reputation: 188

A much faster alternative to kriging for large datasets is griddify in the marmap package. It took me a while to find this, but it works well. It uses bilinear interpolation and although it is designed for bathymetric maps, it works with any xyz data.

Upvotes: 1

Edzer Pebesma
Edzer Pebesma

Reputation: 4121

The complexity of the choleski decomposition (or similar) is O(n^3), meaning that if you multiply the number of points by 10, the time it will take increases with a factor 1000. There are two ways out of this problem, at least for as far as gstat is concerned:

  1. install an optimized version of BLAS (eg OpenBLAS, or MKL) - this does not solve the O(n^3) problem, but may speed up maximally a factor n with n the number of cores available
  2. Avoid decomposing the full covariance matrix by choosing local neighbourhoods (arguments maxdist and/or nmax)

Upvotes: 4

Related Questions