Cohen Fisher
Cohen Fisher

Reputation: 67

Matching two data frames of longitudes and latitudes in python

I have a list of longitudes and latitudes that are stores located in a city.

ID   Latitude Longitude
1    28.2828  48.8392
2    28.3829  48.2947
3    27.9274  48.9274
4    28.9284  48.1937
5    27.2749  48.2804
… 
1000 27.9292  48.9284

I have another list of longitudes and latitudes that have stores located in the state.

ID   Latitude Longitude
8392 28.73948 48.9284
7274 19.82744 27.2837
7293 28.72847 48.92847
8384 18.28474 83.29374
2848 28.92745 48.8293
…

Using python, how can I find which data points in the second data frame are located in the area made up by the first data frame?

In other words, this is my desired result because these ID’s in the second data frame are located in the city made up by the first data frame. All of the other ID’s are filtered out because they span other areas.

ID    Latitude Longitude 
8392 28.73948 48.9284
7293 28.72847 48.92847
2848 28.92745 48.8293

Upvotes: 0

Views: 277

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31146

  • your sample data does not have any points that intersect between state and convex hull of city
  • to find an intersection you need a polygon that represents your city. This can be achieved with https://shapely.readthedocs.io/en/latest/manual.html#object.convex_hull
  • once your have a polygon that represents the city you can sjoin() to other points. I have simulated this to get some points
  • also provided visualisation to demonstrate
import pandas as pd
import geopandas as gpd
import io
import shapely

df_city = pd.read_csv(
    io.StringIO(
        """ID   Latitude Longitude
1    28.2828  48.8392
2    28.3829  48.2947
3    27.9274  48.9274
4    28.9284  48.1937
5    27.2749  48.2804
1000 27.9292  48.9284"""
    ),
    sep="\s+",
)

df_state = pd.read_csv(
    io.StringIO(
        """ID   Latitude Longitude
8392 28.73948 48.9284
7274 19.82744 27.2837
7293 28.72847 48.92847
8384 18.28474 83.29374
2848 28.92745 48.8293"""
    ),
    sep="\s+",
)

city_geom = shapely.geometry.MultiPoint(
    gpd.points_from_xy(df_city["Longitude"], df_city["Latitude"])
).convex_hull


# have some overlapping points...
df_state2 = pd.concat([df_state, df_city.sample(2)])
gpd.GeoDataFrame(
    df_state2, geometry=gpd.points_from_xy(df_state2["Longitude"], df_state2["Latitude"], crs="epsg:4326")
).sjoin(gpd.GeoDataFrame(geometry=[city_geom], crs="epsg:4326"))
ID Latitude Longitude geometry index_right
1 28.2828 48.8392 POINT (48.8392 28.2828) 0
3 27.9274 48.9274 POINT (48.9274 27.9274) 0

visualisation

m = gpd.GeoDataFrame(
    df_state, geometry=gpd.points_from_xy(df_state["Longitude"], df_state["Latitude"], crs="epsg:4326")
).explore()
gpd.GeoDataFrame(geometry=[city_geom], crs="epsg:4326").explore(m=m)


enter image description here

Upvotes: 1

Related Questions