Reputation: 828
I am trying to test the spatial autocorrelation in some binary data (i.e. if presence at a point that is near to another is more likely that one which is further away). The data has multiple sites and repeated measures for each site (daily). I'm not interested in testing the autocorrelation between these points, but rather between sites. I'm not sure whether I need to integrate a temporal component to this (currently in the example below it's not included).
To this end I have created a neighbour structure on the mean of each site location, I guess then I should calculate the joint count statistic on the whole dataset. However, I'm finding it difficult firstly to know whether this is the right approach and how I should apply the neighbourhood structure calculated on the mean location of each site across the larger dataset.
The code below creates a reprex of the problem and where I have got to.
# Set the number of sites
num_sites <- 10
# Generate random latitude and longitude for each site
site_latitude <- runif(num_sites, min = -2, max = 2)
site_longitude <- runif(num_sites, min = 71, max = 74)
# Initialize vectors to store latitude and longitude
latitude <- numeric()
longitude <- numeric()
# Repeat each site's latitude and longitude for the desired number of rows
num_rows_per_site <- 300 / num_sites
# Generate latitude and longitude rows with spatial variation within each site
for (i in 1:num_sites) {
# Generate random offsets for latitude and longitude within each site
latitude_offsets <- rnorm(num_rows_per_site, mean = 0, sd = .05)
longitude_offsets <- rnorm(num_rows_per_site, mean = 0, sd = .05)
site_lat <- rep(site_latitude[i], each = num_rows_per_site) + latitude_offsets
site_lon <- rep(site_longitude[i], each = num_rows_per_site) + longitude_offsets
latitude <- c(latitude, site_lat)
longitude <- c(longitude, site_lon)
}
# Generate random dates within a range
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2024-01-01")
date <- sample(seq(start_date, end_date, by = "day"), 300, replace = TRUE)
# Generate site_ID as a sequence
site_ID <- rep(1:num_sites, each = num_rows_per_site)
# Generate presence/absence binary column
presence <- sample(c(0, 1), 300, replace = TRUE)
# Create data frame
df <- data.frame(latitude = latitude,
longitude = longitude,
date = date,
site_ID = site_ID,
presence = presence)
# Calculate mean locations for each site
site_means <- df %>%
group_by(site_ID) %>%
summarise(mean_lat = mean(latitude), mean_lon = mean(longitude))
# Convert mean locations to SpatialPoints
coordinates(site_means) <- c("mean_lon", "mean_lat")
proj4string(site_means) <- CRS("+proj=utm +zone=10 +datum=WGS84")
# Create neighbor object based on mean locations
site_nb <- dnearneigh(coordinates(site_means), d1 = 0, d2 = 1000, longlat = FALSE)
Upvotes: 0
Views: 25