Performing spatial regressions with huge datasets without crashing the computer

I'm running a spatial regression (SAR: spatial autoregressive regression) in R using the spmodel package. My dataset is a sf object formed by centroids.

This dataset has 100'781 observations, and I created a list of 3 neighborhoods for each observation.

With this information, I run the following code:

library( "spmodel" )

Model_splm <- splm( formula       =  variables, 
                    data          =  subset, 
                    listw         =  NbList, 
                    model         = "pooling",
                    lag           =  T,
                    spatial.error = "b",
                    parallel      =  T,
                    local         =  list( parallel = T )
                    ) 

As you can see, I'm using parallel processing to speed up the performance of this regression (my computer has 8 cores and 16 logical processors). However, I'm not able to perform this code because my computer crashes. I'm only able to run this regression when the number of observations is ~ 40'000.

Does anyone have a suggestion of how to be able to perform that regression in my computer? Or any other suggestions? This was just an example, but I need to run multiple spatial regressions, and I've been struggling with this.

Upvotes: 4

Views: 142

Answers (1)

user3666197
user3666197

Reputation: 1

Q1 :
" Or any other suggestions? "

Well, the spmodel authors' paper is brutally open on computational complexity being rather prohibitive in naive use on un-curated data - explicitly warning
about exploding
into exponential EXP-[ TIME?, SPACE? ] and cubic or quadratic POLYNOMIAL-[ TIME?, SPACE? ] limits the use of this package runs into :

The computational cost associated with model fitting is exponential in the sample size for all estimation methods. For maximum likelihood and restricted maximum likelihood, the computational cost of estimating θ is cubic. For semivariogram weighted least squares and semivariogram composite likelihood, the computational cost of estimating θ is quadratic. The computational cost associated with estimating β and prediction is cubic in the model-fitting sample size, regardless of estimation method. Typically, samples sizes approaching 10,000 make the computational cost of model fitting and prediction infeasible, which necessitates the use of big data methods. spmodel offers big data methods for model fitting of point-referenced data via the local argument to splm(). The method is capable of quickly fitting models with hundreds of thousands to millions of observations. Because of the neighborhood structure of areal data, the big data methods used for point-referenced data do not apply to areal data. Thus, there is no big data method for areal data or local argument to spautor(), so model fitting sample sizes cannot be too large. spmodel offers big data methods for prediction of point-referenced data or areal data via the local argument to predict(), capable of quickly predicting hundreds of thousands to millions of observations rather quickly.

Rather clear & sound, isn't it?


Q2 :
" Does anyone have a suggestion of how to be able to perform that regression in my computer? "

Besides a few tricks to eliminate the EXP-[ TIME?, SPACE? ] by pre-clustering and alikes, already offered in the paper, you can try to use and benefit from modern hypervisor-coordinated ultra-scaled computing fabrics, like Dr. Isaac R. Nassi's invented reverse-hypervised computing infrastructure, that ( in spite of all the technology in the background ) still seems for your operating system and the R-code as if it were a single "immensely-large" PC, with almost infinite RAM scales, operating almost infinite CPU-cores, coordinated by Ike's genial reversed-hypervisor ( patented technology recently acquired by ( really ) Big Data ( not just a few hundreds thousands points ) big vendor, so should be available for academic or similar efforts to run a PoC on their premises ).

Making the same with an ordinary computer seems a priori not possible, as authors warned.

Upvotes: 0

Related Questions