Reputation: 31
Let me start off by mentioning that I have no training in statistics and I am new in this field for both spatial statistics and coding in R. However, I am trying to learn how to identify and characterize the spatial distribution of point plotted patterns and between different types of points (marks), as I need to do so for my thesis. But I have reached a point now at which I need help to understand whether I am on the right track or rather if I am using the right commands. Further, I have some more questions all listed below.
The underlying situtation is that I want to decipher intra-site patterns at an archaeological excavation site. The excavation site was divided into 1 square meter grids. Each point represents an artefact with a categorical (factor-valued) mark. My primary objective is to identify spatial relationships between artifacts with different attributes. For doing so, I have first plottet a point pattern map and performed a quadrat test to identify the spatial character (trying to reject the null hypothesis, hence CSR), and then performed Ripley's K function to identify wheter the point process is aggregated or segregated. The p value of the quadrat test is < 0.05 and Ripley's K function reveals that the point pattern shows - as clearly displayed by a density plot (figure 3) - aggregations. As my results so far indicate an inhomogeneous pattern, I have then performed each test taking intensity into account. Again, the quadrat test with intenstiy taken into account rejects the null hypothesis. I did upload my dataset, as well as the entire script and workspace to microsoft onedrive, because the dataset is too large (or is there any better way to share the data?) Here's the link to the data: Download the dataset
# ----------------------------------------------------------------------
# Prepare data
# ----------------------------------------------------------------------
# Read in data file
roma <- read.csv(file = "roma_points.csv", header = T)
# Define the window (class owin) for analyses with three holes
W <- owin(poly=list(list(x=c(100, 104, 104, 102, 102, 100),
y=c(100, 100, 106, 106, 110, 110)),
list(x=c(100, 100, 101, 101), y=c(102, 103, 103, 102)),
list(x=c(100, 100, 101, 101), y=c(106, 107, 107, 106)),
list(x=c(100, 100, 101, 101), y=c(107, 108, 108, 107))))
# Define ppp object based on this project's point locations
roma.ppp <- ppp(x = roma$EAST,
y = roma$NORTH,
window = W,
marks = factor(roma$MARK))
roma.ppp <- as.ppp(roma.ppp)
unitname(roma.ppp) <- c("metre", "metres")
# Plotting point pattern map
plot(roma.ppp, cex = 0.5, main = "Point pattern map", axes = F, ann = F,
cols = c("blue", "black", "red", "green", "sienna"))
# plot the point distribution of each mark
plot(split(roma.ppp), cex = 0.5, main = "Point pattern map of individual marks")
# ----------------------------------------------------------------------
# Chi-squared test with inhomogeneity taken into account
# ----------------------------------------------------------------------
roma.squared.inhom <- quadrat.test(roma.ppp, nx = 4, ny = 10,
method = "MonteCarlo",
lambda = density(roma.ppp))
# plot density map and display grid info
plot(density(roma.ppp),
main = "Density plot + chi-squared results for Inhomogeneous Density")
plot(roma.squared.inhom, cex = 0.5, col = "white", add = T)
Summary of my point pattern:
Marked planar point pattern: 2351 points
Average intensity 81.06897 points per square metre
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 metres
Multitype:
frequency proportion intensity
A 671 0.28541050 23.1379300
B 454 0.19310930 15.6551700
C 27 0.01148447 0.9310345
D 191 0.08124202 6.5862070
E 1008 0.42875370 34.7586200
Window: polygonal boundary
single connected closed polygon with 14 vertices
enclosing rectangle: [100, 104] x [100, 110] metres
(4 x 10 metres)
Window area = 29 square metres
Unit of length: 1 metre
Fraction of frame area: 0.725
Now, what I ultimately want to achieve is to be able to make statements/assumptions about how different artefact types are distributed to each other, more specificaly is for example
artefact type A clustered to artefact type D?
My (vague!) understanding is that I need to perform the estimate of the inhomogeneous version of the cross-type L function for a multitype pattern: lcross.inhom
.
Now, my questions are:
I. The correct approach?
Is it correct to assume that I am best served with lcross.inhom
for this, and specifically with intensity taken into account as my point pattern is inhomogeneous?
I have read that some suggested to perform a loval bivariate K function (localKcross
command) for this. But I don't understand the results and I - quite frankly - don't really understand the difference.
II. How to correctly perform Lcross.inhom
?
Regarding calculating the Lcross.inhom
function. I have figured that I have to adjust either the intensities of the differently marked points or fit a Poisson model to test for clustering in the presence of spatial inhomogeneity (i.e. the null hypothesis should be an inhomogeneous Poisson Process).
I tried two different approaches:
# ----------------------------------------------------------------------
# Ripley's K Function with inhomogeneity taken into account
# ----------------------------------------------------------------------
# Approach No1 ---------------------------------------------------------
# Calculating bandwidth sigma for the kernel estimator of point process
# intensity with cross validated bandwidth selection (bw.scott)
sigmaA <- with(split(roma.ppp), bw.scott(A))
sigmaB <- with(split(roma.ppp), bw.scott(B))
sigmaC <- with(split(roma.ppp), bw.scott(C))
sigmaD <- with(split(roma.ppp), bw.scott(D))
# Calculating with the Kernel smoothed intensity of point pattern
# (density.ppp) the values of the estimated intensity of the sub-
# process of points of type i and type j
lambdaA <- with(split(roma.ppp), density(A, sigmaA))
lambdaB <- with(split(roma.ppp), density(B, sigmaB))
lambdaC <- with(split(roma.ppp), density(C, sigmaC))
lambdaD <- with(split(roma.ppp), density(D, sigmaD))
# Calculating and plotting Lross.inhom with estimated intensities
# by non-parametric smoothing
par(mfrow = c(1, 3))
plot(Lcross.inhom(roma.ppp, "A", "B", lambdaA, lambdaB), . - r ~ r,
main = "Approach 1 (A against B)", legend = FALSE)
plot(Lcross.inhom(roma.ppp, "A", "C", lambdaA, lambdaC), . - r ~ r,
main = "Approach 1 (A against C)", legend = FALSE)
plot(Lcross.inhom(roma.ppp, "A", "D", lambdaA, lambdaD), . - r ~ r,
main = "Approach 1 (A against D)", legend = FALSE)
# Approach No2---------------------------------------------------------
# Fit parametric intensity model, by first fitting a
# Poisson point process model to the observed data
fit <- ppm(roma.ppp ~marks * polynom(x,y,3))
# evaluate fitted intensities at data points
# (these are the intensities of the sub-processes of each type)
inten <- fitted(fit, dataonly=TRUE)
# split according to types of points
lambda <- split(inten, marks(roma.ppp))
# Calculate and plot Lcross.inhom for different marks
par(mfrow = c(1, 3))
plot(Lcross.inhom(roma.ppp, "A", "B", lambda$A, lambda$B), . - r ~ r,
main = "Approach 2 (A against B)", legend = FALSE)
plot(Lcross.inhom(roma.ppp, "A", "C", lambda$A, lambda$C), . - r ~ r,
main = "Approach 2 (A against C)", legend = FALSE)
plot(Lcross.inhom(roma.ppp, "A", "D", lambda$A, lambda$D), . - r ~ r,
main = "Approach 2 (A against D)", legend = FALSE)
However, the results of each of those two approaches are very different. Whereas the 1st approach indicates clustering between type A and type B but only up to circa r=0.3 m, the 2nd approach indicates clustering between those two type for more than 1 meter. Quite similar between type A and type D. On the other hand, between type A and C approach 1 indicates clustering again until circa r=0.3 m whereas approach 2 only indicates clustering uo to r=0.8 m.
Now, I assume that I might not have enough points of mark C, and therefore obtain questionable results.
But for the other types, I am not sure which approach to choose. If I compare the plots with the point pattern maps of each mark, the 1st approach seems to be better fitting.
Though, I'm not entirely sure. Did I make a mistake somewhere in computing the Lcross.inhom function for my point pattern, or have I missunderstood how to correctly interpret the plots? Could it be generally difficult to draw conclusions here because the results are only plotted until r=1 meter? I wasn't sure how to change that, I have read that I could determine r for each mark with
nncross
?
III. How can I visualize the outcome on my point pattern map?
Is there maybe any other way to better display the outcome of lcross.inhom
with a point plotted map?
I'd like to visualize where patterns of clustering and dispersion can be observed, at what level of significance and at what scale.
Let's say I'd like to plot the point pattern map with the two types I am testing against and then highlight the areas within they are clustered to each other.
I've read that I could do so by computing the K function for each observed point, and then plot its own level of significance for the rejection of the null-hypothesis at a given distance d.
Unfortunately, I have absoutely no clue how to do so.
I hope this isn't too confusing and what I am asking for isn't too much. I have searched and read a lot already, respectively the corresponding chapters in the spatstat book, but I must admit that I found most references rather heavy going as a non-statistician. And at this point I am simply missing the forest for the trees. I hope it all makes sense and are able you to understand what it is I have done/tried so far. I am sorry for the long read, but I tried to be as precise as possible. I am also happy to receive tips or suggestions for improvement on how I could have made my first post better.
My thanks and appreciation. All the best, and stay safe
Scotty
Upvotes: 2
Views: 869
Reputation: 2973
This is a late answer, but in case anyone is reading this post, here is some advice.
Before investigating second-order effects like clustering, you need to investigate the first-order characteristics, that is, the spatially varying density of artefacts of each kind. You can do that with density.splitppp
and relrisk.ppp
. Example:
plot(density(split(roma.ppp), bw.diggle))
plot(relrisk(roma.ppp, bw.relrisk))
The first array of plots shows the estimated spatially-varying intensity (number of artefacts per square metre) for each type of artefact. The second array shows the spatially-varying probability that an artefact belongs to a given type.
These plots can be tweaked by changing the smoothing bandwidth sigma
(I recommend using the argument adjust
to adjust the automatically-determined bandwidth) until you are satisfied that they are reasonable, i.e. neither undersmoothed nor oversmoothed.
The K and L functions assume that the point pattern is homogeneous, which is clearly not true for your data. There are versions of the K and L functions that are able to account for inhomogeneous density of the pattern but these require that the density has been estimated accurately. So these calculations should use the results of density.ppp
or density.splitppp
that you have chosen from the first step.
Since there are very few artefacts of type C it may be wise to merge this category into one of the other categories.
Upvotes: 1