Reputation: 208
This question is similar to another one out there but none of the solutions worked for me. Note I have included several attempts at those solutions and results. If another library will achieve this I am open to it.
I am trying to expand a GeoJson file using GeoPandas where it contains multiple multi polygons.
Current geodataframe (3 Rows)
fill fill-opacity stroke stroke-opacity stroke-width title geometry
0 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-80.69500140880155 22.2885709067316...
1 #08c1e6 0.3 #08c1e6 1 1 Severe Hail (POLYGON ((-103.4850007575523 29.2010260633722...
2 #682aba 0.3 #682aba 1 1 Damaging Hail (POLYGON ((-104.2750007349772 32.2629245180204...`
Desired geodataframe (200+ Rows)
fill fill-opacity stroke stroke-opacity stroke-width title geometry
0 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-80.69500140880155 22.2885709067316...
1 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-102.8150007766983 28.2180513479277...
2 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-103.4850007575523 29.0940821135748...
3 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-103.5650007552662 30.9947420843694...
4 #9bf1e2 0.3 #9bf1e2 1 1 Hail Possible (POLYGON ((-103.6150007538374 31.0173836504729...
Sample File of geojson file being used: https://drive.google.com/file/d/1m6cMR4jF3QWp07e23sIdb0UF9xLD062s/view?usp=sharing
What I've Tried with no success:
df3.set_index(['title'])['geometry'].apply(pd.Series).stack().reset_index()
(Returns original unchanged gdf)
def cartesian(x):
return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,7) for i in x.values]))
ndf = pd.DataFrame(cartesian(df3),columns=df3.columns)
(Returns original unchanged gdf)
import geopandas as gpd
from shapely.geometry.polygon import Polygon
from shapely.geometry.multipolygon import MultiPolygon
def explode(indata):
indf = gpd.GeoDataFrame.from_file(indata)
outdf = gpd.GeoDataFrame(columns=indf.columns)
for idx, row in indf.iterrows():
if type(row.geometry) == Polygon:
outdf = outdf.append(row,ignore_index=True)
if type(row.geometry) == MultiPolygon:
multdf = gpd.GeoDataFrame(columns=indf.columns)
recs = len(row.geometry)
multdf = multdf.append([row]*recs,ignore_index=True)
for geom in range(recs):
multdf.loc[geom,'geometry'] = row.geometry[geom]
outdf = outdf.append(multdf,ignore_index=True)
return outdf
explode(GEOJSONFILE)
(Returns original unchanged gdf)
This is my first question on here so if any additional info or details are needed please let me know.
UPDATE: Found out the issue with the explode() function was due to a formatting issue on the file where the geometry was essentially a multi-polygon of multi-polygon causing a loop of only the first multi-polygon. The explode function works.
Upvotes: 9
Views: 12712
Reputation: 7804
You can use Geopandas explode()
.
exploded = original_df.explode()
copying from docstring:
Explode muti-part geometries into multiple single geometries.
Each row containing a multi-part geometry will be split into
multiple rows with single geometries, thereby increasing the vertical
size of the GeoDataFrame.
The index of the input geodataframe is no longer unique and is
replaced with a multi-index (original index with additional level
indicating the multiple geometries: a new zero-based index for each
single part geometry per multi-part geometry).
Returns
-------
GeoDataFrame
Exploded geodataframe with each single geometry
as a separate entry in the geodataframe.
Upvotes: 19