Rebecca James
Rebecca James

Reputation: 385

Create a column to count the number of cities in a country based on coordinates

I have a dataframe with a columns for Country and Coordinates (country_gdf). I have a seperate dataframe with a columns for City, Latitude, and Longitude (cities_gpd).

I want to add a new column to the Country dataframe for a count of the number of cities within a country. But, I'm not sure of how to work with coordinates

My sample data is as follows;

import pandas as pd
import geopandas as gpd
from shapely import wkt

# Sample Country dataframe
country = pd.DataFrame(
    {'Country': ['Argentina', 'Brazil', 'Chile'],
     'Coordinates': ['POINT(-58.66000 -34.58000)', 'POINT(-47.91000 -15.78000)',
                     'POINT(-70.66000 -33.45000)']})
country['Coordinates'] = gpd.GeoSeries.from_wkt(country['Coordinates'])
country_gdf = gpd.GeoDataFrame(country, geometry='Coordinates')

# Sample Cities Dataframe
cities = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago'],
     'Latitude': [-38.4161, -14.2350, -33.4489],
     'Longitude': [-63.6167, -51.9253, -70.6693]})
cities_gpd = gpd.GeoDataFrame(cities, 
                              geometry=gpd.points_from_xy(cities.Longitude, 
                                                          cities.Latitude))
                               

I have tried

from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

polygon= country_gdf['Coordinates']
point = cities_gpd['geometry']

print(polygon.contains(point))

but the output is wrong

Upvotes: 0

Views: 472

Answers (1)

Dima Chubarov
Dima Chubarov

Reputation: 17159

You need polygon boundaries for countries to check containment. You can use the data provided with geopandas.

import pandas as pd
import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

cities = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'São Paulo'],
     'Latitude': [-38.4161, -14.2350, -33.4489, -23.55],
     'Longitude': [-63.6167, -51.9253, -70.6693, -46.633333]})
cities_gpd = gpd.GeoDataFrame(cities, 
                              geometry=gpd.points_from_xy(cities.Longitude, 
                                                          cities.Latitude),
                              crs=4326)
countries_gpd = world.loc[world['name'].isin(['Argentina', 'Brazil', 'Chile'])]

countries_gpd.set_index('name', inplace=True)

You get the following geodataframe

            pop_est      continent iso_a3  gdp_md_est                                           geometry
name                                                                                                     
Argentina   44293293  South America    ARG    879400.0  MULTIPOLYGON (((-68.63401 -52.63637, -68.25000...
Chile       17789267  South America    CHL    436100.0  MULTIPOLYGON (((-68.63401 -52.63637, -68.63335...
Brazil     207353391  South America    BRA   3081000.0  POLYGON ((-53.37366 -33.76838, -53.65054 -33.2...

Now you can say

countries_gpd['geometry'].contains(
       cities_gpd.set_index('City')
                 .filter(like='Buenos Aires',axis=0)['geometry'].item())

to get

name
Argentina     True
Chile        False
Brazil       False

Note cities_gpd needs a crs specified explicitly to avoid a warning about CRS mismatch.

Now to get a count of the cities you can go on with

countries_gpd.apply(lambda x:
     cities_gpd['geometry'].within(x['geometry']).sum(), axis=1)

This would return

name
Argentina    1
Chile        1
Brazil       2
dtype: int64

meaning each country has one city.

Upvotes: 1

Related Questions