Reputation: 1829
I have a dataframe (df) which has three column likes so: (all numbers random)
ID Lat Lon
1 25.32 -63.32
1 25.29 -64.21
1 24.12 -62.43
2 12.42 54.64
2 12.11 53.43
. .... ....
Basically I wanted to have the centroid per ID like so:
ID Lat Lon Cent_lat Cent_lon
1 25.32 -63.32 25.31 -63.25
1 25.29 -64.21 25.31 -63.25
1 24.12 -62.43 25.31 -63.25
2 12.42 54.64 12.20 53.60
2 12.11 53.43 12.20 53.60
I tired the following:
library(geosphere)
library(rgeos)
library(dplyr)
df1 <- by(df,df$ID,centroid(df$Lat, df$Long))
But this gave me this error:
Error in (function (classes, fdef, mtable): unable to find an inherited method for function ‘centroid’ for signature ‘"numeric"’
I even tired
df1 <- by(df,df$ID,centroid(as.numeric(df$Lat), as.numeric(df$Long)))
But this gave me this error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘centroid’ for signature ‘"function"’
Upvotes: 0
Views: 6335
Reputation: 1495
Here's a data.table approach. As @czeinerb mentioned, Lon is the first argument of the centroid function, and Lat is the second. We re-define the centroid
function below so that, in the data.table aggregation, it receives a matrix with 2 columns (Lat|Lon), which is the required input into the geosphere's centroid
function.
# Import packages
library(geosphere)
library(data.table) # Using a data.table approach
# Sample data
df = data.frame("ID" = c(1, 1, 1, 2, 2, 2), "Lat" = c(25.32, 25.29, 24.12, 12.42, 12.11, 12.22), "Lon" = c(-63.32, -64.21, -62.43, 54.64, 53.43, 53.23))
df
ID Lat Lon
1 1 25.32 -63.32
2 1 25.29 -64.21
3 1 24.12 -62.43
4 2 12.42 54.64
5 2 12.11 53.43
6 2 12.22 53.23
# Convert to data.table
setDT(df)
# Re-define centroid function - Lon is first argument and Lat is second
# Geosphere takes a matrix with two columns: Lon|Lat, so we use cbind to coerce the data to this form
findCentroid <- function(Lon, Lat, ...){
centroid(cbind(Lon, Lat), ...)
}
# Find centroid Lon and Lat by ID, as required
df[, c("Cent_lon", "Cent_lat") := as.list(findCentroid(Lon, Lat)), by = ID]
df
ID Lat Lon Cent_lon Cent_lat
1: 1 25.32 -63.32 -63.32000 24.91126
2: 1 25.29 -64.21 -63.32000 24.91126
3: 1 24.12 -62.43 -63.32000 24.91126
4: 2 12.42 54.64 53.76667 12.25003
5: 2 12.11 53.43 53.76667 12.25003
6: 2 12.22 53.23 53.76667 12.25003
Upvotes: 3
Reputation: 379
Function centroid
of the geosphere
package takes a matrix as data argument: "Arguments : x a 2-column matrix (longitude/latitude)"
https://cran.r-project.org/web/packages/geosphere/geosphere.pdf
Also, longitude is the first and latitude is the second column, not the other way around :)
So the code in your case could be like:
library(geosphere)
df <- data.frame(ID = c(1,1,1,2,2,2,2)
, Lon = c(-63.32, -64.43, -62.43, 54.64, 53.43, 54.64, 53.43)
, Lat = c(25.32, 25.29, 24.12, 12.42, 12.11, 11.11, 10.55))
mx <- as.matrix(df)
(mx1 <- by(mx[,2:3], mx[,1], centroid))
With the output:
> INDICES: 1
> lon lat
> [1,] -63.39333 24.91126
> -----------------------------------------------------------------
> INDICES: 2
> lon lat
> [1,] Inf 90
Upvotes: 1
Reputation: 5152
To use centroid
you need polygons with longitude and latitude, in that order. See this example:
df<-rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20),
c(-100,-50), c(-160,-60), c(-180, -10), c(-160,10), c(-60,0),c(-100,-50))
df<-data.frame(ID=rep(c(1,2),times=c(5,6)),Lon=df[,1],Lat=df[,2])
df1 <- by(df[,c("Lon", "Lat")],df$ID,centroid)
df1
df[,c("Cent_lon","Cent_lat")]<-NA
for(i in names(df1))df[df$ID==i,c("Cent_lat","Cent_lon")]<-df1[[i]]
df
ID Lon Lat Cent_lon Cent_lat
1 1 -180 -20 -23.89340 -133.33333
2 1 -160 5 -133.33333 -23.89340
3 1 -60 0 -23.89340 -133.33333
4 1 -160 -60 -133.33333 -23.89340
5 1 -180 -20 -23.89340 -133.33333
6 2 -100 -50 -127.66065 -127.66065
7 2 -160 -60 -26.10686 -26.10686
8 2 -180 -10 -127.66065 -127.66065
9 2 -160 10 -26.10686 -26.10686
10 2 -60 0 -127.66065 -127.66065
11 2 -100 -50 -26.10686 -26.10686
You can use plotArrows
to see the polygon
pol<-split(df[,2:3],df$ID)
#plotArrows(pol[[1]])
plotArrows(as.matrix(pol[[1]]))
points(df1[[1]],col=4)
Upvotes: 3
Reputation: 78792
library(geosphere)
library(ggplot2)
library(dplyr)
states <- map_data("state")
head(states)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
cntrd <- function(x) {
data.frame(centroid(as.matrix(x[,c("long", "lat")])))
}
by(states, states$group, cntrd) %>% head()
## $`1`
## lon lat
## 1 -86.82976 32.82735
##
## $`2`
## lon lat
## 1 -111.6698 34.34309
##
## $`3`
## lon lat
## 1 -92.43826 34.92167
##
## $`4`
## lon lat
## 1 -119.6713 37.40289
##
## $`5`
## lon lat
## 1 -105.5526 39.02653
##
## $`6`
## lon lat
## 1 -72.72553 41.62706
group_by(states, group) %>%
do(cntrd(.))
## Source: local data frame [63 x 3]
## Groups: group [63]
##
## group lon lat
## <dbl> <dbl> <dbl>
## 1 1 -86.82976 32.82735
## 2 2 -111.66978 34.34309
## 3 3 -92.43826 34.92167
## 4 4 -119.67130 37.40289
## 5 5 -105.55264 39.02653
## 6 6 -72.72553 41.62706
## 7 7 -75.51543 39.00879
## 8 8 -77.03411 38.91083
## 9 9 -82.51260 28.69498
## 10 10 -83.46361 32.67562
## # ... with 53 more rows
Upvotes: 3
Reputation: 318
From ?centroid
it says that it only takes a 2-column matrix as its argument. The ID information you have is making the matrix three columns.
df <- rbind(c(25.32,-63.32),c(25.29,-64.32),c(24.12,-62.43),c(12.42,54.64),c(12.11,53.43)
centroid(df)
lon lat
[1,] 24.27109 -60.37098
Upvotes: 0