Reputation: 125
I have not been able to find this anywhere, so hopefully, I don't get flamed too much here.
I have a polyline shapefile and I'm trying to extract the Start and End XY's as new columns and can't seem to find how to do this with geopandas.
I would like to end up with four new columns, StartX, StartY, EndX, EndY
Would anyone know how to get this?
Upvotes: 0
Views: 1247
Reputation: 15442
Here's a MRE with 100 random-length linestrings
import pandas as pd, numpy as np, shapely.geometry, geopandas as gpd
gdf = gpd.GeoDataFrame(
geometry=[
shapely.geometry.LineString([tuple(i) for i in np.cumsum(np.random.random(size=(np.random.randint(2, 20), 2)), axis=1)])
for _ in range(100)
],
)
The dataframe looks like this:
In [2]: gdf
Out[2]:
geometry
0 LINESTRING (0.36610 1.03088, 0.06126 0.29416, ...
1 LINESTRING (0.46164 1.26251, 0.16294 0.45719, ...
2 LINESTRING (0.45853 1.00003, 0.81500 0.92658, ...
3 LINESTRING (0.89925 1.11712, 0.22847 0.97792, ...
4 LINESTRING (0.05748 1.04220, 0.19561 0.86062, ...
.. ...
95 LINESTRING (0.62349 0.71080, 0.91981 1.44771, ...
96 LINESTRING (0.18924 0.91123, 0.94212 1.39855, ...
97 LINESTRING (0.79314 1.29408, 0.20462 0.73740, ...
98 LINESTRING (0.07744 0.87544, 0.87101 0.97909, ...
99 LINESTRING (0.31411 0.53442, 0.63755 0.78146, ...
[100 rows x 1 columns]
You can use gdf.boundary
to get the bounds of the LineString as a MultiPoint:
In [3]: gdf.boundary
Out[3]:
0 MULTIPOINT (0.36610 1.03088, 0.32418 0.81727)
1 MULTIPOINT (0.46164 1.26251, 0.30703 0.95910)
2 MULTIPOINT (0.45853 1.00003, 0.95016 1.53127)
3 MULTIPOINT (0.89925 1.11712, 0.95730 1.13740)
4 MULTIPOINT (0.05748 1.04220, 0.42954 1.36282)
...
95 MULTIPOINT (0.62349 0.71080, 0.93710 1.55117)
96 MULTIPOINT (0.18924 0.91123, 0.48047 1.08956)
97 MULTIPOINT (0.79314 1.29408, 0.24173 0.56003)
98 MULTIPOINT (0.07744 0.87544, 0.23844 1.23815)
99 MULTIPOINT (0.31411 0.53442, 0.00648 0.76329)
Length: 100, dtype: geometry
This can then be combined with explode()
, which will convert any multi-part geometry into individual rows, then unstack
the resulting extra index groups to create new columns, 0
and 1
, for the starting and ending points. Each column will contain shapely.geometry.Point
objects:
In [4]: bounds = gdf.geometry.boundary.explode(index_parts=True).unstack()
Finally, we can grab the x
and y
values for these points directly:
In [5]: gdf['StartX'] = bounds[0].x
...: gdf['StartY'] = bounds[0].y
...: gdf['EndX'] = bounds[1].x
...: gdf['EndY'] = bounds[1].y
This gives the final result you were looking for:
In [6]: gdf
Out[6]:
geometry StartX StartY EndX EndY
0 LINESTRING (0.36610 1.03088, 0.06126 0.29416, ... 0.366098 1.030880 0.324176 0.817272
1 LINESTRING (0.46164 1.26251, 0.16294 0.45719, ... 0.461642 1.262513 0.307032 0.959099
2 LINESTRING (0.45853 1.00003, 0.81500 0.92658, ... 0.458530 1.000032 0.950164 1.531267
3 LINESTRING (0.89925 1.11712, 0.22847 0.97792, ... 0.899254 1.117123 0.957299 1.137399
4 LINESTRING (0.05748 1.04220, 0.19561 0.86062, ... 0.057482 1.042202 0.429537 1.362817
.. ... ... ... ... ...
95 LINESTRING (0.62349 0.71080, 0.91981 1.44771, ... 0.623486 0.710795 0.937098 1.551175
96 LINESTRING (0.18924 0.91123, 0.94212 1.39855, ... 0.189237 0.911229 0.480470 1.089565
97 LINESTRING (0.79314 1.29408, 0.20462 0.73740, ... 0.793135 1.294084 0.241726 0.560030
98 LINESTRING (0.07744 0.87544, 0.87101 0.97909, ... 0.077441 0.875441 0.238441 1.238148
99 LINESTRING (0.31411 0.53442, 0.63755 0.78146, ... 0.314106 0.534418 0.006481 0.763287
Upvotes: 4