Camille Gontier
Camille Gontier

Reputation: 155

How to optimize the r parameter when fitting a single model to several datasets in spatstat?

I would like to fit a single model to several independent datasets in R using the spatstat package. Here, I have 3 independent datasets (ppp objects called NMJ1, NMJ2, and NMJ3), to which I want to fit a common model. The way to go should be to use the mppm function:

data <- listof(NMJ1,NMJ2,NMJ3)
data <- hyperframe(X=1:3, Points=data)
r <- matrix(c(120, 240, 240, 90), nrow = 2, ncol = 2)
model <- mppm(Points ~marks*abs(sqrt(x^2+y^2)), data, Strauss(r))

However, r is a free parameter, which I want to optimize. I started by doing:

ll <- -Inf
r_hat <- 0
for (r in seq(from=0.5, to=10, by=0.05)){
  fit1 <- ppm(NMJ1~marks*sqrt(x^2+y^2),Strauss(r))
  fit2 <- ppm(NMJ2~marks*sqrt(x^2+y^2),Strauss(r))
  fit3 <- ppm(NMJ3~marks*sqrt(x^2+y^2),Strauss(r))
  if(logLik.ppm(fit1)+logLik.ppm(fit2)+logLik.ppm(fit3) > ll) { 
    ll <- logLik.ppm(fit1)+logLik.ppm(fit2)+logLik.ppm(fit3)
    r_hat <- r
  }
}

(i.e. by finding the value of r that optimizes the sum of the log-likelihood for the 3 fits on my 3 datasets), which runs without warning; however, here I'm fitting 3 independent models on each dataset, while I want my model to be the same for all of them.

I then tried:

ll <- -Inf
r_hat <- 0
for (r in seq(from=0.5, to=10, by=0.05)){
  fit <- mppm(Points ~ marks*sqrt(x^2+y^2), data, Strauss(r))
  ll_temp <- logLik.mppm(fit)
  
  if(ll_temp > ll) { 
    ll <- ll_temp
    r_hat <- r
  }
}

which returns the following warning:

Warning message:
In logLik.mppm(fit) :
  log likelihood is not available for non-Poisson model; log-pseudolikelihood returned

Besides this warning, the returned values for r do not seem to be realistic (they are larger than the average distances between my points). My questions are thus the following:

  1. Is there a "clean" way to optimize the r parameter when using mppm ?
  2. On the statistics side, can r be computed analytically (from, for instance, the distribution of the distances between my points) ?

Upvotes: 0

Views: 170

Answers (1)

Adrian Baddeley
Adrian Baddeley

Reputation: 2973

This is not currently implemented in a neat way for mppm, that is, for fitting models to several point pattern datasets. It is on the "to do" list. (For fitting models to a single point pattern dataset, see the last paragraph below.)

Your code is OK, except for one problem: it assumes that it is valid to compare the log pseudolikelihood values of two models fitted with different values of r. That's not always true because, by default, ppm and mppm use the border method of edge correction, and by default, the border distance rbord is chosen to be equal to the interaction distance r. In your code, rbord is different for each model, so the pseudolikelihoods are not strictly comparable (effectively the models are based on different "sample sizes").

To avoid this problem, you can either explicitly set the border distance rbord to be equal to the maximum value of r that will be used:

mppm(Points ~ something, Strauss(r), rbord=rmax)

or specify another edge correction such as correction="iso" or correction="none" in the call to mppm. Either of these strategies will ensure that the pseudolikelihood values are comparable.

You noted that estimates of r obtained by your search procedure were unrealistic. This could be attributable to the issue discussed above. But it does also happen sometimes when the search domain is chosen to be too large (if you allowed the software to try unrealistic values).

Another option, which would be much faster, is to use mppm to fit a model with the PairPiece interaction, specifying a sequence of jump points r, then to plot the resulting fitted interaction (extracted from the model using fitin.) This will allow you to judge the most appropriate value of r, or at least the range of r that is appropriate. It also gives you a way to judge whether a threshold type interaction model is appropriate. See the bottom of page 517 and left panel of Figure 13.19 in the spatstat book.

In the jargon, r is called an irregular parameter. As explained in the spatstat package documentation and the spatstat book, only the regular parameters are estimated by ppm or mppm, and the irregular parameters must be determined in some other way. See Sections 9.12 and 13.6.3 of the spatstat book.

Because r is a distance threshold, the likelihood or pseudolikelihood is not continuous as a function of r. So the answer to your second question is strictly "no", there is no analytic formula for estimating r.

For fitting a model to a single point pattern dataset, there are two functions that allow estimation of the interaction range: profilepl which uses a method similar to the one you have implemented above, and ippm which solves the score equation analytically. To fit a Strauss model to a single dataset where r has to be estimated, the only option supported is profilepl because the Strauss model is not differentiable with respect to r. Very little information is known about the statistical performance of estimators of interaction range. This is still a research problem.

Upvotes: 3

Related Questions