Reputation: 145
I have a dataframe with vectors Latitude
, Longitude
, Period
, and ID
. I would like to calculate the positional centroid for each period (n = 2), weighted by the number of observations for each unique ID, so that IDs with fewer observations receive higher weights than those with more observations.
My dataframe is 300,000 obs but looks something like this:
dat <- data.frame(Latitude = c(35.8, 35.85, 36.7, 35.2, 36.1, 35.859, 36.0, 37.0, 35.1, 35.2),
Longitude = c(-89.4, -89.5, -89.4, -89.8, -90, -89.63, -89.7, -89, -88.9, -89),
Period = c(early, early, early, early, early, late, late, late, late, late),
ID = c(A, A, A, B, C, C, C, D, E, E))
I can easily calculate the mean between early and late periods using aggregate... centroid <- aggregate(cbind(Longitude, Latitude) ~ Period, dat, mean)
but is there a way to calculate the centroid for each period weighted by the number of observations for each ID so that those with more observations do not bias the mean? And, if possible, is there an elegant way of doing this inside the aggregate
function or a dplyr
solution also would be helpful.
Any assistance would be much appreciated. Best,
Nick
Upvotes: 1
Views: 357
Reputation: 18561
If you want to calculate your own weights, based on the group Period
and ID
so that each ID
has the same influence on the centeriods by Period
then we just need to divide 1 through the number of observations in each Perdiod
ID
group. Below is the code using weighted.mean
in dplyr::across
.
library(dplyr)
dat %>%
group_by(Period, ID) %>%
mutate(weight = 1/n()) %>%
group_by(Period) %>%
summarise(across(c(Longitude, Latitude),
~ weighted.mean(.x, w = weight)))
#> # A tibble: 2 x 3
#> Period Longitude Latitude
#> <chr> <dbl> <dbl>
#> 1 early -89.7 35.8
#> 2 late -89.2 36.0
# data
dat <- data.frame(Latitude = c(35.8, 35.85, 36.7, 35.2, 36.1, 35.859, 36.0, 37.0, 35.1, 35.2),
Longitude = c(-89.4, -89.5, -89.4, -89.8, -90, -89.63, -89.7, -89, -88.9, -89),
Period = rep(c("early", "late"), each = 5),
ID = c("A", "A", "A", "B", "C", "C", "C", "D", "E", "E"))
Created on 2021-08-26 by the reprex package (v0.3.0)
Upvotes: 1