Reputation: 12394
I want to count how many points there are per Polygon
# Credits of this code go to: https://stackoverflow.com/questions/69642668/the-indices-of-the-two-geoseries-are-different-understanding-indices/69644010#69644010
import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests
# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on
Now I sjoin
the two. I use df_poly
first, in order to add the points dfp
to the GeoDataframe
df_poly
.
df_poly.sjoin(dfp)
Now I want to count how many points
there are per polygon
.
I thought
df_poly.sjoin(dfp).groupby('OBJECTID').count()
But that does not add a column
to the GeoDataframe
df_poly
with the count
of each group
.
Upvotes: 3
Views: 3601
Reputation: 43
Building on both your own answer and Rob Raymond's answer, I tried to create a more generic one as a function that:
Here it is:
def count_points_in_polygons(points, polygons, polygon_id, new_column="points_count"):
# Save the index to restore it later
original_index = polygons.index
# Ensures polygon_id is not the index but a column
if original_index.name == polygon_id:
polygons = polygons.reset_index()
# Count points in polygons
points_in_polygon = (
# Spatial join associates points and polygons that intersects each other
polygons.sjoin(
points,
how="inner", # Only keep points that matches a polygon
)
.groupby(polygon_id) # Group points by polygons
.size() # Get number of points
.rename(new_column) # Name your column as you want it to appear in polygons
)
# Add count series to the polygons dataframe
polygons = (
polygons.set_index(polygon_id) # Ensures the index is the same as points_in_polygons
.join(
points_in_polygon,
how="left", # Keep polygons containing no points
)
.fillna({new_column: 0}) # Fill NaN with 0
)
if original_index.name != polygon_id:
# Avoids duplicating polygon_id as column and index
polygons = polygons.reset_index()
polygons = polygons.set_index(original_index) # Restore the original index
return polygons
In your specific case it could be called like this:
count_points_in_polygons(dfp, df_poly, "OBJECTID", new_column="n_points")
Upvotes: 2
Reputation: 12394
Building on the answere Fergus McClean provided, this can even be done in less code:
df_poly.merge(df_poly.sjoin(dfp).groupby('OBJECTID').size().rename('n_points').reset_index())
However, the method (.join()
) proposed by Rob Raymond to combine the two dataframes
keeps the entries that have no count.
Upvotes: 0
Reputation: 186
You need to add one of the columns from the output of count()
back into the original DataFrame using merge. I have used the geometry column and renamed it to n_points
:
df_poly.merge(
df_poly.sjoin(
dfp
).groupby(
'OBJECTID'
).count().geometry.rename(
'n_points'
).reset_index())
Upvotes: 2
Reputation: 31146
This is a follow on to this question The indices of the two GeoSeries are different - Understanding Indices
gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points")
can then be simply joined to the polygon GeoDataFrame to give how many points were foundhow="left"
to ensure it's a left join, not an inner join. Any polygons with no points with have NaN
you may want to fillna(0)
in this case.import pandas as pd
import numpy as np
import geopandas as gpd
import shapely.geometry
import requests
# source some points and polygons
# fmt: off
dfp = pd.read_html("https://www.latlong.net/category/cities-235-15.html")[0]
dfp = pd.concat([dfp,dfp]).reset_index(drop=True)
dfp = gpd.GeoDataFrame(dfp, geometry=dfp.loc[:,["Longitude", "Latitude",]].apply(shapely.geometry.Point, axis=1))
res = requests.get("https://opendata.arcgis.com/datasets/69dc11c7386943b4ad8893c45648b1e1_0.geojson")
df_poly = gpd.GeoDataFrame.from_features(res.json())
# fmt: on
df_poly.join(
gpd.sjoin(dfp, df_poly).groupby("index_right").size().rename("points"),
how="left",
)
Upvotes: 3