Reputation: 669
I have a shapefile which contains both polygons and multipolygons as following:
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 (POLYGON ((-2.052573095588467 57.1342600856536...
2 AB12 (POLYGON ((-2.128066321470298 57.0368357386797...
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
The 2nd and 3rd row correspond to Multipolygon while the rest are polygons. I would like to expand the rows whose geometry is Multipolygon type into rows of Polygon as following.
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 POLYGON ((-2.052573095588467 57.1342600856536...
2 AB11 POLYGON ((-2.045849648028651 57.13076387483844...
3 AB12 POLYGON ((-2.128066321470298 57.0368357386797...
4 AB12 POLYGON ((-2.096125852304303 57.14808092585477
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
Note that the AB11 and AB12 Multipolygon have been expanded to multiple rows where each row corresponds to one polygon data.
I think this is geopanda data manipulation. Is there a pythonic way to achieve the above?
Thank you!
Upvotes: 0
Views: 2328
Reputation: 30605
We can use numpy for more speed if you have only two columns.
If you have a dataframe like
name geometry 0 0 polygn(x) 1 2 (polygn(x), polygn(x)) 2 3 polygn(x) 3 4 (polygn(x), polygn(x))
Then numpy meshgrid will help
def cartesian(x):
return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,2) for i in x.values]))
ndf = pd.DataFrame(cartesian(df),columns=df.columns)
Output:
name geometry 0 0 polygn(x) 1 2 polygn(x) 2 2 polygn(x) 3 3 polygn(x) 4 4 polygn(x) 5 4 polygn(x)
%%timeit
ndf = pd.DataFrame(cartesian(df),columns=df.columns)
1000 loops, best of 3: 679 µs per loop
%%timeit
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()
100 loops, best of 3: 5.44 ms per loop
Upvotes: 2
Reputation: 669
My current solution to the above is in two-folds.
step 1. go through each row and if the type is multipolygon, then apply list comprehension.
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 [POLYGON ((-2.052573095588467 57.1342600856536...
2 AB12 [POLYGON ((-2.128066321470298 57.0368357386797...
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
step 2: Use the trick of expanding list of elements in a row to multiple rows.
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()
name level_1 0
0 AB10 0 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 0 POLYGON ((-2.052573095588467 57.13426008565365...
2 AB11 1 POLYGON ((-2.045849648028651 57.13076387483844...
3 AB12 0 POLYGON ((-2.128066321470298 57.0368357386797,...
4 AB12 1 POLYGON ((-2.096125852304303 57.14808092585477...
5 AB13 0 POLYGON ((-2.261525922489881 57.10693578217748...
6 AB14 0 POLYGON ((-2.261525922489879 57.10693578217748...
Please let me know if there is a way to do this in one step!
Upvotes: 0