Reputation: 63
I have a dataframe with longitude, latitude and temperature columns, and a separate geojson file downloaded from here representing an area that fits within all the coordinates in my df.
How do I filter the temperature df so that I only have records of areas that fit within the geojson file?
I have no idea where to start on this so haven't attempted anything yet!
For example, in the temperature df:
dput(temperature_df)
structure(list(Lat = c(52, 53.23, 51.37), Long = c(-1.89, -2.45,
-2.1), Temp = c(13.67, 10.38, -2.45)), class = "data.frame", row.names = c(NA,
-3L))
I am expecting to return the filtered df:
Lat | Long | Temp |
---|---|---|
52.00 | -1.89 | 13.67 |
53.23 | -2.45 | 10.38 |
Upvotes: 0
Views: 408
Reputation: 6885
Preamble
To ensure the validity of your results, it is essential that you know which coordinate system (CRS) forms the basis of the coordinates for both of your datasets. You state that setting a CRS of WGS84/EPSG:4326 correctly plotted your points, but unless you know for a fact that your point coordinates are WGS84, it can cause erroneous results. See this answer for a more in-depth discussion on why you cannot just assume.
This solution is based on the assumption that the coordinates in your temperature df are WGS84/EPSG:4326. The CRS of the geojson file is OSGB36/British National Grid. Given the data are from the UK ONS, we can be 100% certain the CRS is correct so it will be used as the base CRS in this example.
You will not achieve the desired result based on the sample data you provided as all of your points are outside the West Midlands region, so I have selected the South West region in order to generate a representative result.
Load required packages and data
library(sf)
library(dplyr)
library(ggplot2)
# Read and subset geojson file from your working directory, downloaded from
# https://geoportal.statistics.gov.uk/datasets/7c23fbe8e89d4cf79ff7f2a6058e6200_0/explore
sf_poly <- read_sf("Regions_December_2023_Boundaries_EN_BFC_4211903881402860639.geojson") %>%
filter(RGN23NM == "South West")
sf_poly["RGN23NM"]
# Simple feature collection with 1 feature and 1 field
# Geometry type: MULTIPOLYGON
# Dimension: XY
# Bounding box: xmin: 82668.52 ymin: 5336.966 xmax: 435920 ymax: 246052.2
# Projected CRS: OSGB36 / British National Grid
# # A tibble: 1 × 2
# RGN23NM geometry
# <chr> <MULTIPOLYGON [m]>
# 1 South West (((83962.84 5401.15, 83970.68 5400.71, 83977.49 5404.05, 83985.27 540…
Now create an sf of your point data:
# Your point data (assumed to be WGS84/EPSG:4326)
points <- read.table(text = "Lat Long Temp
52.00 -1.89 13.67
53.23 -2.45 10.38
51.37 -2.10 -2.45", header = TRUE) %>%
mutate(ID = 1:n()) # Added a unique ID here for illustrative purposes
# Create sf from your df, assign assumed WGS84 CRS, then project to CRS of sf_poly
sf_points <- st_as_sf(points, coords = c("Long", "Lat"), crs = 4326) %>%
st_transform(st_crs(sf_poly))
sf_points
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
# Projected CRS: OSGB36 / British National Grid
# Temp ID geometry
# 1 13.67 1 POINT (407648.6 233512.2)
# 2 10.38 2 POINT (370057.6 370423.5)
# 3 -2.45 3 POINT (393135.4 163442.6)
Note that sf_poly and sf_points have the same CRS, but as stated, you can only be certain that this is valid if your original points coordinates are WGS84. If you know they are in a different CRS, modify crs = 4326
to match the actual CRS. Also, be aware of the order that longitude and latitude are declared in e.g. coords = c("Long", "Lat")
. The longitude/x-axis values need to be declared first. If you applied the code from the other answer, your points would be plotted off the coast of Somalia.
Subset your points_sf using poly_sf
I have included two options:
st_intersection()
, only points within poly_sf are returned. You can safely ignore the warning:# Return sf_points only if they fall within sf_poly
sf_inter <- st_intersection(sf_points, sf_poly)
# Warning message:
# attribute variables are assumed to be spatially constant throughout all geometries
sf_inter[,c("ID", "RGN23NM")]
# Simple feature collection with 2 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 393135.4 ymin: 163442.6 xmax: 407648.6 ymax: 233512.2
# Projected CRS: OSGB36 / British National Grid
# ID RGN23NM geometry
# 1 1 South West POINT (407648.6 233512.2)
# 3 3 South West POINT (393135.4 163442.6)
st_join()
, all points are returned, but each point is identified as inside or outside sf_poly:# Assign values to all sf_points
sf_join <- st_join(sf_points, sf_poly) %>%
mutate(RGN23NM = if_else(is.na(RGN23NM), "Outside Area", RGN23NM))
sf_join[,c("ID", "RGN23NM")]
# Simple feature collection with 3 features and 2 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
# Projected CRS: OSGB36 / British National Grid
# ID RGN23NM geometry
# 1 1 South West POINT (407648.6 233512.2)
# 2 2 Outside Area POINT (370057.6 370423.5)
# 3 3 South West POINT (393135.4 163442.6)
And the result, using sf_join for illustrative purposes:
ggplot() +
geom_sf(data = sf_poly) +
geom_sf(data = sf_join, aes(colour = RGN23NM), size = 2) +
scale_colour_discrete(name = "Temperature\nPoints")
Upvotes: 2
Reputation: 7400
sf::st_as_sf()
and sf::st_intersects()
are your friends.
As you do not provide data, we use a .geojson
file taken from here.
tmp = tempfile(fileext = ".geojson")
download.file("https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/communes.geojson",
tmp)
library(sf)
gj = read_sf(tmp)
gj = gj[1L, ] # use just the first polygon
# some points randomly created around centre of gj
points = structure(list(lat = c(3.30454190848165, 3.29946780571954, 3.29612845330439, 3.27991321713169, 3.30485103998978), long = c(46.1807449149569, 46.1254499991144, 46.137126928633, 46.1617752630819, 46.1662531731711), temp = c(-6.46641247389143, -0.910962931755058, 32.8972983969805, 24.8380473564282, 11.9387508544478)), row.names = c(NA, -5L), class = "data.frame")
points
to points_sf
Set crs in accordance to the polygon's crs.
# convert to sf object
points_sf = st_as_sf(x = points, coords = c("lat", "long"), crs = st_crs(gj))
> points_sf
Simple feature collection with 5 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3.279913 ymin: 46.12545 xmax: 3.304851 ymax: 46.18074
Geodetic CRS: WGS 84
temp geometry
1 -6.4664125 POINT (3.304542 46.18074)
2 -0.9109629 POINT (3.299468 46.12545)
3 32.8972984 POINT (3.296128 46.13713)
4 24.8380474 POINT (3.279913 46.16178)
5 11.9387509 POINT (3.304851 46.16625)
Plot both:
library(ggplot2)
ggplot() +
geom_sf(data = gj) +
geom_sf(data = points_sf)
sf::st_intersects()
and subset()
points_sf$inside = as.integer(st_intersects(points_sf, gj))
# filter:
points_sf = subset(points_sf, subset = inside == 1L, select = -inside)
We obtain:
> points_sf
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3.279913 ymin: 46.13713 xmax: 3.304851 ymax: 46.16625
Geodetic CRS: WGS 84
temp geometry
3 32.89730 POINT (3.296128 46.13713)
4 24.83805 POINT (3.279913 46.16178)
5 11.93875 POINT (3.304851 46.16625)
Plot both with filtered points:
ggplot() +
geom_sf(data = gj) +
geom_sf(data = points_sf)
Potential re-conversion of filtered points_sf
:
cbind.data.frame(matrix(unlist(points_sf$geometry), ncol = 2L, byrow = TRUE),
points_sf$temp) |>
`colnames<-`(c("lat", "lon", "temp"))
lat lon temp
1 3.296128 46.13713 32.89730
2 3.279913 46.16178 24.83805
3 3.304851 46.16625 11.93875
Upvotes: 1