Reputation: 155
I would like to fit a single model to several independent datasets in R
using the spatstat
package. Here, I have 3 independent datasets (ppp
objects called NMJ1
, NMJ2
, and NMJ3
), to which I want to fit a common model. The way to go should be to use the mppm
function:
data <- listof(NMJ1,NMJ2,NMJ3)
data <- hyperframe(X=1:3, Points=data)
r <- matrix(c(120, 240, 240, 90), nrow = 2, ncol = 2)
model <- mppm(Points ~marks*abs(sqrt(x^2+y^2)), data, Strauss(r))
However, r
is a free parameter, which I want to optimize. I started by doing:
ll <- -Inf
r_hat <- 0
for (r in seq(from=0.5, to=10, by=0.05)){
fit1 <- ppm(NMJ1~marks*sqrt(x^2+y^2),Strauss(r))
fit2 <- ppm(NMJ2~marks*sqrt(x^2+y^2),Strauss(r))
fit3 <- ppm(NMJ3~marks*sqrt(x^2+y^2),Strauss(r))
if(logLik.ppm(fit1)+logLik.ppm(fit2)+logLik.ppm(fit3) > ll) {
ll <- logLik.ppm(fit1)+logLik.ppm(fit2)+logLik.ppm(fit3)
r_hat <- r
}
}
(i.e. by finding the value of r
that optimizes the sum of the log-likelihood for the 3 fits on my 3 datasets), which runs without warning; however, here I'm fitting 3 independent models on each dataset, while I want my model to be the same for all of them.
I then tried:
ll <- -Inf
r_hat <- 0
for (r in seq(from=0.5, to=10, by=0.05)){
fit <- mppm(Points ~ marks*sqrt(x^2+y^2), data, Strauss(r))
ll_temp <- logLik.mppm(fit)
if(ll_temp > ll) {
ll <- ll_temp
r_hat <- r
}
}
which returns the following warning:
Warning message:
In logLik.mppm(fit) :
log likelihood is not available for non-Poisson model; log-pseudolikelihood returned
Besides this warning, the returned values for r
do not seem to be realistic (they are larger than the average distances between my points). My questions are thus the following:
r
parameter when using mppm
?r
be computed analytically (from, for instance, the distribution of the distances between my points) ?Upvotes: 0
Views: 170
Reputation: 2973
This is not currently implemented in a neat way for mppm
, that is, for fitting models to several point pattern datasets. It is on the "to do" list.
(For fitting models to a single point pattern dataset, see the last paragraph below.)
Your code is OK, except for one problem: it assumes that it is valid to compare the log pseudolikelihood values of two models fitted with different values of r
. That's not always true because, by default, ppm
and mppm
use the border method of edge correction, and by default, the border distance rbord
is chosen to be equal to the interaction distance r
. In your code, rbord
is different for each model, so the pseudolikelihoods are not strictly comparable (effectively the models are based on different "sample sizes").
To avoid this problem, you can either explicitly set the border distance rbord
to be equal to the maximum value of r
that will be used:
mppm(Points ~ something, Strauss(r), rbord=rmax)
or specify another edge correction such as correction="iso"
or correction="none"
in the call to mppm
. Either of these strategies will ensure that the pseudolikelihood values are comparable.
You noted that estimates of r
obtained by your search procedure were unrealistic. This could be attributable to the issue discussed above. But it does also happen sometimes when the search domain is chosen to be too large (if you allowed the software to try unrealistic values).
Another option, which would be much faster, is to use mppm
to fit a model with the PairPiece
interaction, specifying a sequence of jump points r
, then to plot the resulting fitted interaction (extracted from the model using fitin
.) This will allow you to judge the most appropriate value of r
, or at least the range of r
that is appropriate. It also gives you a way to judge whether a threshold type interaction model is appropriate. See the bottom of page 517 and left panel of Figure 13.19 in the spatstat book.
In the jargon, r
is called an irregular parameter. As explained in the spatstat
package documentation and the spatstat book, only the regular parameters are estimated by ppm
or mppm
, and the irregular parameters must be determined in some other way. See Sections 9.12 and 13.6.3 of the spatstat book.
Because r
is a distance threshold, the likelihood or pseudolikelihood is not continuous as a function of r
. So the answer to your second question is strictly "no", there is no analytic formula for estimating r
.
For fitting a model to a single point pattern dataset, there are two functions that allow estimation of the interaction range: profilepl
which uses a method similar to the one you have implemented above, and ippm
which solves the score equation analytically. To fit a Strauss model to a single dataset where r
has to be estimated, the only option supported is profilepl
because the Strauss model is not differentiable with respect to r
. Very little information is known about the statistical performance of estimators of interaction range. This is still a research problem.
Upvotes: 3