Tim
Tim

Reputation: 563

Spatial join of two GeoDataFrames according to distance

I have two geodataframes/geoseries, one of points and one of polygons.

I would like to keep the points that are at a certain distance in meters from at least one polygon. Is this possible? Specially with built-in methods of geopandas. Or maybe there is a way to do it with the Rtree package? To do this in an efficient way in case the two geoseries are very large.

In general, what is the best way to do this kind of sorting for two different geoseries? Based on the distance.

Here is an example. The polygons are in blue and the points in red, with a black check near the ones we keep at a certain distance and a red cross for the ones we do not keep. enter image description here

Thanks for your advice !

Upvotes: 1

Views: 893

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31146

  • this demonstrates adding a buffer to polygons then spatial join
  • have used UK administrational areas as polygons
  • have used UK hospitals as points
  • it's important to use CRS systems correctly for distance. UTM for distance. In this example using 10km buffer
import requests, io, json
import geopandas as gpd
import shapely.geometry
import pandas as pd

# UK administrational area boundaries
res = requests.get(
    "https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson"
)
gdfp = gpd.GeoDataFrame.from_features(res.json()["features"], crs="CRS84").pipe(
    lambda d: d.rename(columns={c: c.lower() for c in d.columns})
).rename(columns={"lad20cd": "areaCode","lad20nm":"areaName"})

# just a few areas...
gdfp = gdfp.loc[gdfp["areaName"].str.contains("County")]

# get some public addressess - hospitals.  data that can be scattered
dfhos = pd.read_csv(io.StringIO(requests.get("http://media.nhschoices.nhs.uk/data/foi/Hospital.csv").text),
    sep="¬",engine="python",)


# create a geo dataframe of hospitals
gdfhos = gpd.GeoDataFrame(
    data=dfhos,
        geometry=dfhos.apply(lambda r: shapely.geometry.Point(r["Longitude"],r["Latitude"]), axis=1), crs="EPSG:4326"
    )

# join with a buffer, use UTM so diatance makes sense
BUFFER = 10**4 # 10 km
gdfbuf = gdfp.copy()
gdfbuf.geometry = gdfp.geometry.to_crs(gdfp.estimate_utm_crs()).buffer(BUFFER).to_crs("EPSG:4326")
gdfbuf = gpd.sjoin(gdfbuf, gdfhos)

demonstrate it has worked

  • use a plotly scatter_mapbox plotting hospitals that are within 10km of administrational boundary polygon
  • have added layers to show original boundary and buffered boundary
import plotly.express as px

px.scatter_mapbox(
    gdfbuf, lat="Latitude", lon="Longitude",
    hover_data=["OrganisationName","Postcode"]
).update_layout(
    mapbox={
        "style": "open-street-map",
        "zoom": 5,
        "layers": [
            {
                "source": json.loads(gdfp.geometry.to_json()),
                "below": "traces",
                "type": "line",
                "color": "purple",
                "line": {"width": 1.5},
            },
            {
                "source": json.loads(gdfbuf.geometry.to_json()),
                "below": "traces",
                "type": "line",
                "color": "yellow",
                "line": {"width": 1.5},
            },
        ],
    },
    margin={"l": 0, "r": 0, "t": 0, "b": 0},
)

enter image description here

Upvotes: 1

Related Questions