Reputation: 385
I have a dataframe with a columns for Country and Coordinates (country_gdf). I have a seperate dataframe with a columns for City, Latitude, and Longitude (cities_gpd).
I want to add a new column to the Country dataframe for a count of the number of cities within a country. But, I'm not sure of how to work with coordinates
My sample data is as follows;
import pandas as pd
import geopandas as gpd
from shapely import wkt
# Sample Country dataframe
country = pd.DataFrame(
{'Country': ['Argentina', 'Brazil', 'Chile'],
'Coordinates': ['POINT(-58.66000 -34.58000)', 'POINT(-47.91000 -15.78000)',
'POINT(-70.66000 -33.45000)']})
country['Coordinates'] = gpd.GeoSeries.from_wkt(country['Coordinates'])
country_gdf = gpd.GeoDataFrame(country, geometry='Coordinates')
# Sample Cities Dataframe
cities = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago'],
'Latitude': [-38.4161, -14.2350, -33.4489],
'Longitude': [-63.6167, -51.9253, -70.6693]})
cities_gpd = gpd.GeoDataFrame(cities,
geometry=gpd.points_from_xy(cities.Longitude,
cities.Latitude))
I have tried
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon
polygon= country_gdf['Coordinates']
point = cities_gpd['geometry']
print(polygon.contains(point))
but the output is wrong
Upvotes: 0
Views: 472
Reputation: 17159
You need polygon boundaries for countries to check containment. You can use the data provided with geopandas.
import pandas as pd
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
cities = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'São Paulo'],
'Latitude': [-38.4161, -14.2350, -33.4489, -23.55],
'Longitude': [-63.6167, -51.9253, -70.6693, -46.633333]})
cities_gpd = gpd.GeoDataFrame(cities,
geometry=gpd.points_from_xy(cities.Longitude,
cities.Latitude),
crs=4326)
countries_gpd = world.loc[world['name'].isin(['Argentina', 'Brazil', 'Chile'])]
countries_gpd.set_index('name', inplace=True)
You get the following geodataframe
pop_est continent iso_a3 gdp_md_est geometry
name
Argentina 44293293 South America ARG 879400.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000...
Chile 17789267 South America CHL 436100.0 MULTIPOLYGON (((-68.63401 -52.63637, -68.63335...
Brazil 207353391 South America BRA 3081000.0 POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...
Now you can say
countries_gpd['geometry'].contains(
cities_gpd.set_index('City')
.filter(like='Buenos Aires',axis=0)['geometry'].item())
to get
name
Argentina True
Chile False
Brazil False
Note cities_gpd
needs a crs
specified explicitly to avoid a warning about CRS mismatch.
Now to get a count of the cities you can go on with
countries_gpd.apply(lambda x:
cities_gpd['geometry'].within(x['geometry']).sum(), axis=1)
This would return
name
Argentina 1
Chile 1
Brazil 2
dtype: int64
meaning each country has one city.
Upvotes: 1