Reputation: 455
I have a dataset that looks like this:
site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19
For samples with the same site
name, I want to calculate their centre point and then add it as a column to the dataset. Some site
names are duplicated twice, other three times, other four times.
Like this:
site lat long centre_lat centre_long
bras2 41.21 -115.11 value here value here
tex4 45.3 -112.31 45.3 -112.31
bras2 41.15 -115.15 value here value here
bras2 41.12 -115.19 value here value here
How can I do this?
Upvotes: 6
Views: 2408
Reputation: 5499
If you're using spatial data, you should look into using the sf
package. It handles geometries and functions for operating on them well.
Code below shows using both sf::st_centroid
and geosphere::centroid
. I prefer sf
's way of doing things.
df <- read.table(header=TRUE, text= "site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19")
library(dplyr)
library(geosphere)
library(sf)
# Using sf's st_centroid
df_sf <- st_as_sf(df, coords = c('long', 'lat'))
centroids_sf <- df_sf %>%
group_by(site) %>%
summarize(geometry = st_union(geometry)) %>%
st_centroid
# Using geosphere::centroid
centroids_geoshpere <- df_sf %>%
group_by(site) %>%
filter(n() >2) %>% ## geosphere needs polygons therefore 3+ points
st_union() %>%
st_cast('POLYGON') %>%
as('Spatial') %>% # geoshpere expects SpatialPolygons objects
centroid()
centroids_geoshpere
#> [,1] [,2]
#> [1,] -115.15 41.16001
centroids_sf
#> Simple feature collection with 2 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -115.15 ymin: 41.16 xmax: -112.31 ymax: 45.3
#> CRS: NA
#> # A tibble: 2 x 2
#> site geometry
#> * <chr> <POINT>
#> 1 bras2 (-115.15 41.16)
#> 2 tex4 (-112.31 45.3)
Looks like thery're close enough to the same point. I don't think geosphere::centroid
can give a centroid for a single point, but may be wrong. sf::st_centroid
has no problem with 1,2, or more points.
Created on 2020-12-20 by the reprex package (v0.3.0)
Upvotes: 9
Reputation: 24079
The geosphere package has a function centroid
to solve problems such as this one.
It as long as there more than one point in shape it is straight forward. Most of the code below involves handling the single point case in the example above.
df <- read.table(header=TRUE, text= "site lat long
bras2 41.21 -115.11
tex4 45.3 -112.31
bras2 41.15 -115.15
bras2 41.12 -115.19")
library(dplyr)
library(geosphere)
df %>% group_by(side) %>% centroid(.[ ,c(3,2)])
sites <- split(df, df$site)
results <-lapply(sites, function(x) {
if(nrow(x)>1 ) {
value <- as.data.frame(centroid(x[, c(3,2)]))
}
else {
value <- x[1, c(3,2)]
names(value) <- c("lon", "lat")
}
value$site <- x$site[1]
value
})
answer<-bind_rows(results)
lon lat site
1 -115.15 41.16001 bras2
2 -112.31 45.30000 tex4
Upvotes: 2
Reputation: 72683
You could calculate the means grouped by site names using ave
after stripping off the site numbers using gsub
.
within(dat, {
g <- gsub("\\d", "", site)
mid.lat <- ave(lat, g)
mid.long <- ave(long, g)
rm(g)
})
# site lat long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150 41.160
# 2 tex4 45.30 -112.31 -112.310 45.300
# 3 bras2 41.15 -115.15 -115.150 41.160
# 4 bras2 41.12 -115.19 -115.150 41.160
# 5 foo1 42.10 -123.10 -123.225 42.225
# 6 foo2 42.20 -123.20 -123.225 42.225
# 7 foo11 42.30 -123.30 -123.225 42.225
# 8 foo12 42.30 -123.30 -123.225 42.225
Or, if you depend on the NA
:
within(dat, {
g <- gsub("\\d", "", site)
n <- ave(site, g, FUN=length)
mid.lat <- NA
mid.long <- NA
mid.lat[n > 1] <- ave(lat[n > 1], g[n > 1])
mid.long[n > 1] <- ave(long[n > 1], g[n > 1])
rm(g, n)
})
# site lat long mid.long mid.lat
# 1 bras2 41.21 -115.11 -115.150 41.160
# 2 tex4 45.30 -112.31 NA NA
# 3 bras2 41.15 -115.15 -115.150 41.160
# 4 bras2 41.12 -115.19 -115.150 41.160
# 5 foo1 42.10 -123.10 -123.225 42.225
# 6 foo2 42.20 -123.20 -123.225 42.225
# 7 foo11 42.30 -123.30 -123.225 42.225
# 8 foo12 42.30 -123.30 -123.225 42.225
Data:
dat <- structure(list(site = c("bras2", "tex4", "bras2", "bras2", "foo1",
"foo2", "foo11", "foo12"), lat = c(41.21, 45.3, 41.15, 41.12,
42.1, 42.2, 42.3, 42.3), long = c(-115.11, -112.31, -115.15,
-115.19, -123.1, -123.2, -123.3, -123.3)), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 2