Reputation: 49
I have a dataset of specific locations that have longitude and latitude data on them. I want to create a new column that tells me which neighborhoods those points fall on.
My dataset (df) has over 50,000 locations, some are duplicated.
For example:
Location Longitude Latitude Neighborhood
Fenway Park -71.0972 42.3467 Fenway-Kenmore
I also have the shapefiles for the neighborhood that you can find here https://data.boston.gov/group/geospatial
Upvotes: 0
Views: 341
Reputation: 780
You are looking for a spatial join. Since your data are in lat/long you need to project to the neighborhood coordinates so that your data line up. Lat/long are not great when dealing with units of measure.
You then need to do a spatial join of the points using st_within which will merge the attributes if they fall within the neighborhood polygons. From there you can subset the columns to your desired output.
Note the default is a left join so make sure your data frame comes first. You didn't provide any data, so I don't know if this will 100% work for all cases in your data.
Edit: I found that the neighborhoods shapefile from that site had some invalid geometries. You probably want to fix that by st_make_valid()
. I also added an example with some dummy data and the neighborhood shapefile. Duplicates shouldn't matter you can deal with them however you like.
library(sf)
# read in neighborhoods shapefile
neighborhoods <- st_read('Boston_Neighborhoods.shp')%>%
st_make_valid()
# convert df to spatial dataframe and reproject
spdf <- st_as_sf(df, coords=c("Longitude", "Latitude"), crs=st_crs("EPSG:4326")) %>%
st_transform(crs=st_crs(neighborhoods))
# do the spatial join
output <- st_join(spdf, neighborhoods, join=st_within)
Here's an example with some dummy data.
plist <- as.data.frame(do.call(rbind, list(c(756363.463294525, 2926358.57093617),
c(756363.463294525, 2926358.57093617),
c(761610.797283232, 2937282.14778728),
c(763204.448665894, 2946825.83819385),
c(763085.235486707, 2950007.83174181))))
pts <- st_as_sf(plist,
crs=st_crs(neighborhoods),
coords=c(1,2))
st_join(pts, neighborhoods, join=st_within)
# Simple feature collection with 5 features and 7 fields
# geometry type: POINT
# dimension: XY
# bbox: xmin: 756363.5 ymin: 2926359 xmax: 763204.4 ymax: 2950008
# projected CRS: NAD83 / Massachusetts Mainland (ftUS)
# OBJECTID Name Acres Neighborho SqMiles ShapeSTAre ShapeSTLen geometry
# 1 27 Roslindale 1605.5682 15 2.51 69938273 53563.91 POINT (756363.5 2926359)
# 2 27 Roslindale 1605.5682 15 2.51 69938273 53563.91 POINT (756363.5 2926359)
# 3 28 Jamaica Plain 2519.2454 11 3.94 109737890 56349.94 POINT (761610.8 2937282)
# 4 29 Mission Hill 350.8536 13 0.55 15283120 17918.72 POINT (763204.4 2946826)
# 5 30 Longwood 188.6119 28 0.29 8215904 11908.76 POINT (763085.2 2950008)
Upvotes: 2