imdinos
imdinos

Reputation: 49

How do I assign my specific locations to their neighborhoods?

I have a dataset of specific locations that have longitude and latitude data on them. I want to create a new column that tells me which neighborhoods those points fall on.

My dataset (df) has over 50,000 locations, some are duplicated.

For example:

Location          Longitude        Latitude          Neighborhood
Fenway Park       -71.0972          42.3467         Fenway-Kenmore

I also have the shapefiles for the neighborhood that you can find here https://data.boston.gov/group/geospatial

Upvotes: 0

Views: 341

Answers (1)

at80
at80

Reputation: 780

You are looking for a spatial join. Since your data are in lat/long you need to project to the neighborhood coordinates so that your data line up. Lat/long are not great when dealing with units of measure.

You then need to do a spatial join of the points using st_within which will merge the attributes if they fall within the neighborhood polygons. From there you can subset the columns to your desired output.

Note the default is a left join so make sure your data frame comes first. You didn't provide any data, so I don't know if this will 100% work for all cases in your data.

Edit: I found that the neighborhoods shapefile from that site had some invalid geometries. You probably want to fix that by st_make_valid(). I also added an example with some dummy data and the neighborhood shapefile. Duplicates shouldn't matter you can deal with them however you like.

library(sf)

# read in neighborhoods shapefile
neighborhoods <- st_read('Boston_Neighborhoods.shp')%>% 
                 st_make_valid()

# convert df to spatial dataframe and reproject
spdf <- st_as_sf(df, coords=c("Longitude", "Latitude"), crs=st_crs("EPSG:4326")) %>% 
        st_transform(crs=st_crs(neighborhoods))

# do the spatial join
output <- st_join(spdf, neighborhoods, join=st_within)

Here's an example with some dummy data.

plist <- as.data.frame(do.call(rbind, list(c(756363.463294525, 2926358.57093617), 
                                           c(756363.463294525, 2926358.57093617),
                                           c(761610.797283232, 2937282.14778728),
                                           c(763204.448665894, 2946825.83819385),
                                           c(763085.235486707, 2950007.83174181))))

pts <- st_as_sf(plist, 
                crs=st_crs(neighborhoods),
                coords=c(1,2))

st_join(pts, neighborhoods, join=st_within)
# Simple feature collection with 5 features and 7 fields
# geometry type:  POINT
# dimension:      XY
# bbox:           xmin: 756363.5 ymin: 2926359 xmax: 763204.4 ymax: 2950008
# projected CRS:  NAD83 / Massachusetts Mainland (ftUS)
# OBJECTID          Name     Acres Neighborho SqMiles ShapeSTAre ShapeSTLen                 geometry
# 1       27    Roslindale 1605.5682         15    2.51   69938273   53563.91 POINT (756363.5 2926359)
# 2       27    Roslindale 1605.5682         15    2.51   69938273   53563.91 POINT (756363.5 2926359)
# 3       28 Jamaica Plain 2519.2454         11    3.94  109737890   56349.94 POINT (761610.8 2937282)
# 4       29  Mission Hill  350.8536         13    0.55   15283120   17918.72 POINT (763204.4 2946826)
# 5       30      Longwood  188.6119         28    0.29    8215904   11908.76 POINT (763085.2 2950008)

Upvotes: 2

Related Questions