Reputation: 95
I have a dataframe (addresses
) that has Latitude / Longitude information for 6 locations. I've also added (for illustrative purposes) the country in which this location actually belongs to (I don't have this in my real data). I then convert this dataframe to sf object (addresses_sf
) using the st_as_sf
function.
I then get the sf object of the Canadian borders via the rnaturalearth
package and save this as can
.
I want to determine which entries in address_sf
are in Canada so I use the st_intersection
function from the sf package.
Unfortunately, when this runs, it only identifies the 2nd entry (45.41126, -75.70591) as being in the Canadian borders and fails to identify the remaining 3 entries in the results.
I've attached some reproducible code below in the hopes it may identify what I'm doing wrong...
## Load necessary libraries
library(tidyverse)
library(rnaturalearth)
library(sf)
## Create data
addresses <- tibble::tribble(
~lat, ~lon, ~country,
43.77264, -88.4462, "usa",
45.41126, -75.70591, "canada",
42.235, -83.06181, "canada",
45.38441, -70.84368, "canada",
42.11125, -83.1107, "canada",
44.74441, -74.86597, "usa"
)
## Convert to sf
addresses_sf <- addresses %>%
rowid_to_column() %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326)
## Get Canadian borders
can <- rnaturalearth::ne_countries(country = "Canada", returnclass = "sf") %>%
select(name) %>%
st_transform(crs = 4326)
## Determine which entries are in Canada
sf::st_intersection(addresses_sf, can)
Update:
Thanks to @Ray for highlighting the error in the above script as being one of not having adequate resolution. I updated the call to rnaturalearth with the line below and the code is now functioning as expected.
## Get Canadian borders
can <- rnaturalearth::ne_countries(country = "Canada", scale = "large", returnclass = "sf") %>%
select(name) %>%
st_transform(crs = 4326)
Upvotes: 3
Views: 939
Reputation: 2288
@Tony Machado, you are running into a resolution issue here. The borderline of the polygon (i.e. Canada) from naturalearth comes in a resolution that may cut the borderline. If you plot your problem at hand, you will see that some of the cities are close to the border.
library(ggplot2)
ggplot() +
geom_sf(data = can, fill = "lightblue") +
geom_sf(data = addresses_sf, color = "red", size = 2)
You can either
scale = "large"
in your ne_countries()
call); or workFor the buffering it might be worth reprojecting your WGS84 to UTM and work with a finer distance measure. In WGS84 the dist
is in arc degrees. That can be painful.
But from below, you can see that you increase your includes
when we buffer the Canadian borderline. Unfortunately, this catches also "Detroit" :(
Hope this gets you going.
sf::st_intersection(addresses_sf, st_buffer(can, dist = 0.001))
Simple feature collection with 5 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -83.1107 ymin: 42.11125 xmax: -70.84368 ymax: 45.41126
Geodetic CRS: WGS 84
# A tibble: 5 x 4
rowid country name geometry
* <int> <chr> <chr> <POINT [°]>
1 2 canada Canada (-75.70591 45.41126)
2 3 canada Canada (-83.06181 42.235)
3 4 canada Canada (-70.84368 45.38441)
4 5 canada Canada (-83.1107 42.11125)
5 6 usa Canada (-74.86597 44.74441)
Upvotes: 3