KRL
KRL

Reputation: 125

Finding the begin X and Y of a line string in Geopandas

I have not been able to find this anywhere, so hopefully, I don't get flamed too much here.

I have a polyline shapefile and I'm trying to extract the Start and End XY's as new columns and can't seem to find how to do this with geopandas.

I would like to end up with four new columns, StartX, StartY, EndX, EndY

Would anyone know how to get this?

Upvotes: 0

Views: 1247

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15442

Here's a MRE with 100 random-length linestrings

import pandas as pd, numpy as np, shapely.geometry, geopandas as gpd
gdf = gpd.GeoDataFrame(
    geometry=[
        shapely.geometry.LineString([tuple(i) for i in np.cumsum(np.random.random(size=(np.random.randint(2, 20), 2)), axis=1)])
        for _ in range(100)
    ],
)

The dataframe looks like this:

In [2]: gdf
Out[2]:
                                             geometry
0   LINESTRING (0.36610 1.03088, 0.06126 0.29416, ...
1   LINESTRING (0.46164 1.26251, 0.16294 0.45719, ...
2   LINESTRING (0.45853 1.00003, 0.81500 0.92658, ...
3   LINESTRING (0.89925 1.11712, 0.22847 0.97792, ...
4   LINESTRING (0.05748 1.04220, 0.19561 0.86062, ...
..                                                ...
95  LINESTRING (0.62349 0.71080, 0.91981 1.44771, ...
96  LINESTRING (0.18924 0.91123, 0.94212 1.39855, ...
97  LINESTRING (0.79314 1.29408, 0.20462 0.73740, ...
98  LINESTRING (0.07744 0.87544, 0.87101 0.97909, ...
99  LINESTRING (0.31411 0.53442, 0.63755 0.78146, ...

[100 rows x 1 columns]

You can use gdf.boundary to get the bounds of the LineString as a MultiPoint:

In [3]: gdf.boundary
Out[3]:
0     MULTIPOINT (0.36610 1.03088, 0.32418 0.81727)
1     MULTIPOINT (0.46164 1.26251, 0.30703 0.95910)
2     MULTIPOINT (0.45853 1.00003, 0.95016 1.53127)
3     MULTIPOINT (0.89925 1.11712, 0.95730 1.13740)
4     MULTIPOINT (0.05748 1.04220, 0.42954 1.36282)
                          ...
95    MULTIPOINT (0.62349 0.71080, 0.93710 1.55117)
96    MULTIPOINT (0.18924 0.91123, 0.48047 1.08956)
97    MULTIPOINT (0.79314 1.29408, 0.24173 0.56003)
98    MULTIPOINT (0.07744 0.87544, 0.23844 1.23815)
99    MULTIPOINT (0.31411 0.53442, 0.00648 0.76329)
Length: 100, dtype: geometry

This can then be combined with explode(), which will convert any multi-part geometry into individual rows, then unstack the resulting extra index groups to create new columns, 0 and 1, for the starting and ending points. Each column will contain shapely.geometry.Point objects:

In [4]: bounds = gdf.geometry.boundary.explode(index_parts=True).unstack()

Finally, we can grab the x and y values for these points directly:

In [5]: gdf['StartX'] = bounds[0].x
   ...: gdf['StartY'] = bounds[0].y
   ...: gdf['EndX'] = bounds[1].x
   ...: gdf['EndY'] = bounds[1].y

This gives the final result you were looking for:

In [6]: gdf
Out[6]:
                                             geometry    StartX    StartY      EndX      EndY
0   LINESTRING (0.36610 1.03088, 0.06126 0.29416, ...  0.366098  1.030880  0.324176  0.817272
1   LINESTRING (0.46164 1.26251, 0.16294 0.45719, ...  0.461642  1.262513  0.307032  0.959099
2   LINESTRING (0.45853 1.00003, 0.81500 0.92658, ...  0.458530  1.000032  0.950164  1.531267
3   LINESTRING (0.89925 1.11712, 0.22847 0.97792, ...  0.899254  1.117123  0.957299  1.137399
4   LINESTRING (0.05748 1.04220, 0.19561 0.86062, ...  0.057482  1.042202  0.429537  1.362817
..                                                ...       ...       ...       ...       ...
95  LINESTRING (0.62349 0.71080, 0.91981 1.44771, ...  0.623486  0.710795  0.937098  1.551175
96  LINESTRING (0.18924 0.91123, 0.94212 1.39855, ...  0.189237  0.911229  0.480470  1.089565
97  LINESTRING (0.79314 1.29408, 0.20462 0.73740, ...  0.793135  1.294084  0.241726  0.560030
98  LINESTRING (0.07744 0.87544, 0.87101 0.97909, ...  0.077441  0.875441  0.238441  1.238148
99  LINESTRING (0.31411 0.53442, 0.63755 0.78146, ...  0.314106  0.534418  0.006481  0.763287

Upvotes: 4

Related Questions