S_S
S_S

Reputation: 1412

Pandas DataFrame to GeoDataFrame with Polygon geometry using groupby and lambda

I have a pandas DataFrame like this

name    loc_x    loc_y    grp_name
a1        1.0        2.0    set1
a2        2.0        3.0    set1
a3        3.2        4.1    set2
a4        7.9        4.2    set2

I want to generate a GeoDataFrame that generates a polygon using loc_x and loc_y grouped on grp_name and also includes a column name that has the values in my original data frame concatenated by |? The result should look like this

        name    geometry
set1    a1|a2   POLYGON ((1.0, 2.0)...)
set2    a3|a4   POLYGON ((3.2, 4.1)...)

I do this to get the geometry column but how do I also get an additional column with name concatenated from my base data frame?

gdf = gpd.GeoDataFrame(geometry=df.groupby('grp_name').apply(
      lambda g: Polygon(gpd.points_from_xy(g['loc_x'], g['loc_y']))))

Upvotes: 0

Views: 1144

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31236

  • required a modification to your test data. A polygon has a minimum of three points
  • this comes down to knowing pandas. groupby().apply() provides a reference to dataframe for each group. It's then simple to construct the two outputs you want per group
import pandas as pd
import geopandas as gpd
import shapely.geometry
import io

df = pd.read_csv(io.StringIO("""name    loc_x    loc_y    grp_name
a1        1.0        2.0    set1
a2        2.0        3.0    set1
a2.5      3.0        4.0    set1
a3        3.2        4.1    set2
a4        7.9        4.2    set2
a4.5      8.1        4.3    set2"""),sep="\s+",)

gpd.GeoDataFrame(
    df.groupby("grp_name").apply(
        lambda d: pd.Series(
            {
                "name": "|".join(d["name"].tolist()),
                "geometry": shapely.geometry.Polygon(
                    d.loc[:, ["loc_x", "loc_y"]].values
                ),
            }
        )
    )
)

Upvotes: 1

Related Questions