Reputation: 71
I am trying to give coordinate values a label.
I have one data frame with coordinates, which looks similar to
lat lng
1 10
3 11
2 12
I have another data of grids and their top, bottom, left and right coordinates.
id left right bottom top
a 0.5 1.5 9 11
b 1.5 2.5 9 11
c 2.5 3.5 10 11
id columns are the id of each rectangular grid.
I would like to merge the id to the first dataframe, based on the range of the longitude and latitude of the grid.
So the first row of the expected output would be something like
lat lng id
1 10 a
How would I achieve this without specifying the range one by one. I was thinking of some kind of subset function, but I'm not sure how to code this.
I would appreciate some help.
Upvotes: 0
Views: 123
Reputation: 160417
FYI, "latitude" is generally North/South, which to me typically means "top" and "bottom" would make more sense. The code matches your frames, but it seems a little different.
This is a range-based join or a "non-equi" join. This is not natively supported in dplyr
, but it can be done in R with at least three other packages:
fuzzyjoin::fuzzy_left_join(
df1, df2,
by = c("lat" = "left", "lat" = "right", "lng" = "bottom", "lng" = "top"),
match_fun = list(`>=`, `<=`, `>=`, `<=`)
)[, c("lat", "lng", "id")]
# lat lng id
# 1 1 10 a
# 2 3 11 c
# 3 2 12 <NA>
(There's also fuzzyjoin::fuzzy_inner_join
if you prefer to drop the bottom row here.)
The "verb" functions in fuzzyjoin
are intentionally (I believe) named to coincide with the dplyr
join functions, so this can be done inline in a dplyr pipe:
library(dplyr)
df1 %>%
fuzzyjoin::fuzzy_left_join(
., df2,
by = c("lat" = "left", "lat" = "right", "lng" = "bottom", "lng" = "top"),
match_fun = list(`>=`, `<=`, `>=`, `<=`)
) %>%
select(lat, lng, id)
library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)
DT2[DT1, on = .(left <= lat, right >= lat, bottom <= lng, top >= lng)][, .(lat=left, lng=top, id)]
# lat lng id
# <int> <int> <char>
# 1: 1 10 a
# 2: 3 11 c
# 3: 2 12 <NA>
I should note that the column names post-merge are not necessarily what one might expect: for instance, you won't see lat
or lng
, because the output column names are taken from the "right" of the merge, i.e. DT2
. The merge syntax of DT2[DT1,...]
is analogous to left_join(DT1, DT2, ...)
, so the names are opposite the side one might expect. There is no data.table::[
-syntax for inner or full merges, nor for taking the names from the other argument. (In my opinion, a left-merge should use the names from the "left" frame, but there is likely good reason it is what it is.)
SQL supports it natively, so if you're using a DBMS somewhere, the queries and join can be done there. Lacking that, the sqldf
package gives SQL access to local R objects (using the SQLite engine).
sqldf::sqldf(
"select df1.lat, df1.lng, df2.id
from df1
left join df2 on df1.lat between df2.left and df2.right
and df1.lng between df2.bottom and df2.top"
)
# lat lng id
# 1 1 10 a
# 2 3 11 c
# 3 2 12 <NA>
Data
df1 <- structure(list(lat = c(1L, 3L, 2L), lng = 10:12), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(id = c("a", "b", "c"), left = c(0.5, 1.5, 2.5), right = c(1.5, 2.5, 3.5), bottom = c(9L, 9L, 10L), top = c(11L, 11L, 11L)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1