Izzat Kamaruddin
Izzat Kamaruddin

Reputation: 11

How to calculate the values from different data frame based on boundaries

I have a geodataframe containing columns of :-

  1. no. of mobile subscription
  2. longitude (X)
  3. latitude (Y)

and another geodataframe called "boudaries" which containing the geometry of boundaries

I want to create another column in boundaries geodataframe which calculate the sum of mobile subscription based on the latitude and longitude that falls on the boundaries in the boundary dataframe.

I really hope someone can help me in this issue. Appreciate your kind assistance.

I have tried to merge both data frames, but I have no idea on how to calculate the data based on the boundaries

Upvotes: 0

Views: 72

Answers (1)

kithuto
kithuto

Reputation: 539

This answer outputs the num of subscription given a specific area:

import geopandas as gpd
import pandas as pd

# creating a dummy boundary geodataframe
df = pd.DataFrame({'name': ['first boundary', 'second boundary'],
                    'area': ['POLYGON ((-10 -3, -10 3, 3 3, 3 -10, -10 -3))', 'POLYGON ((-20 -21, -12 -17, 2 -15, 5 -20, -20 -21))']})

boundaries = gpd.GeoDataFrame(df[['name']], geometry=gpd.GeoSeries.from_wkt(df.area, crs = 'epsg:4326'))

# creating a dummy geodataframe with some points (you can change it to your coordenates)
points = pd.DataFrame({'num_sub': [1, 2, 3, 4, 5],
                       'coordenates': ['POINT(-7 1)', 'POINT(1 -2)', 'POINT(-17 -20)', 'POINT(0 -18)', 'POINT(-5 0)']})

subs_coordenates = gpd.GeoDataFrame(points[['num_sub']], geometry=gpd.GeoSeries.from_wkt(points.coordenates, crs = 'epsg:4326'))

# returning the sum of subscription for each area and storing in a num_subs column
boundaries['num_subs'] =  boundaries.geometry.apply(lambda x: x.contains(subs_coordenates.geometry).sum())

If you have the X and Y cordenates in diferent columns (named X and Y in this example), you can do as folows:

points = pd.DataFrame({'num_sub': [1, 2, 3, 4, 5],
                       'X': [-7, 1, -17, 0, -5],
                       'Y': [1, -2, -20, -18, 0]})

# Converting the x and y columns to geometry points
points['coordenates'] = points[['X', 'Y']].apply(lambda x: 'POINT('+str(x.X)+' '+str(x.Y)+')', axis=1)

# creating the geopandas dataframe
subs_coordenates = gpd.GeoDataFrame(points[['num_sub']], geometry=gpd.GeoSeries.from_wkt(points.coordenates, crs = 'epsg:4326'))

# returning the sum of subscription for each area and storing in a num_subs column
boundaries['num_subs'] =  boundaries.geometry.apply(lambda x: x.contains(subs_coordenates.geometry).sum())

Hope it works for you.

Upvotes: 0

Related Questions