konstanze
konstanze

Reputation: 511

Finding the centroid of an xarray

I have an xarray that represents boolean (e.g., forest/no forest) geospatial data with the dimensions x, y representing latitude and longitude, and I want the centroid coordinates.

import xarray as xr
import numpy as np
A = xr.DataArray(np.array([[0,0,1,1,0,0],
                           [0,1,1,1,0,0],
                           [0,1,1,1,0,0],
                           [0,1,1,0,0,0],
                           [0,0,0,0,1,0]]),
                 dims=['y','x'],
                 coords={'x': [10,20,30,40,50,60],
                         'y': [50,40,30,20,10]})

I came up with the following solution:

centr_x = float(np.sum(A.sum('y')/np.sum(A.sum('y')) * A.x))
centr_y = float(np.sum(A.sum('x')/np.sum(A.sum('x')) * A.y))

Just wondering if I'm missing a function in xarray that does just this? It seems to me that this would be a rather common thing to calculate.

Thank you for suggestions!

Upvotes: 2

Views: 696

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15442

There's nothing in xarray that interprets raster data as geometries or point collections, or to calculate the centroid of such features.

Your approach is clever - you could certainly use weighted averaging to get a simple centroid, e.g.:

In [8]: A.x.weighted(A).mean()
Out[8]:
<xarray.DataArray 'x' ()>
array(31.81818182)

In [9]: A.y.weighted(A).mean()
Out[9]:
<xarray.DataArray 'y' ()>
array(32.72727273)

You could also do this using geometries/points using shapely. To do this, you can convert the raster data into a series of points:

In [4]: points = A.where(A).to_series().dropna()

In [5]: points
Out[5]:
y   x
50  30    1.0
    40    1.0
40  20    1.0
    30    1.0
    40    1.0
30  20    1.0
    30    1.0
    40    1.0
20  20    1.0
    30    1.0
10  50    1.0
dtype: float64

In [6]: points = shapely.geometry.MultiPoint(list(zip(
   ...:     points.index.get_level_values('x'),
   ...:     points.index.get_level_values('y'),
   ...: )))

In [7]: points
Out[7]: <shapely.geometry.multipoint.MultiPoint at 0x1050b76a0>

then you can use shapely tools to calculate the centroid:

In [8]: points.centroid
Out[8]: <shapely.geometry.point.Point at 0x1533fada0>

In [9]: points.centroid.xy
Out[9]: (array('d', [31.818181818181817]), array('d', [32.72727272727273]))

Note that if your data corresponds to geospatial coordinates you may want to compute the centroid within an equal area projection. A similar xarray-based approach would require converting the lat/lon coordinates to an equal-area projection:


In [10]: xx, yy = np.meshgrid(A.x, A.y)

In [11]: points = gpd.points_from_xy(
    ...:     xx.ravel(), yy.ravel(), crs='epsg:4326'
    ...: )

In [12]: equal_area_xy = points.to_crs('+proj=cea')

In [13]: A.coords["equal_area_y"] = (
    ...:     ("y", "x"), equal_area_xy.y.reshape(A.shape)
    ...: )

In [14]: A.coords["equal_area_x"] = (
    ...:     ("y", "x"), equal_area_xy.x.reshape(A.shape)
    ...: )

In [15]: A
Out[15]:
<xarray.DataArray (y: 5, x: 6)>
array([[0, 0, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 1, 0, 0],
       [0, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 1, 0]])
Coordinates:
  * x             (x) int64 10 20 30 40 50 60
  * y             (y) int64 50 40 30 20 10
    equal_area_y  (y, x) float64 4.866e+06 4.866e+06 ... 1.1e+06
    equal_area_x  (y, x) float64 1.113e+06 2.226e+06 ... 6.679e+06

Now you can use the same weighted averaging method to calculate the centroid:


In [16]: equal_area_centroid_y = A.equal_area_y.weighted(A).mean()

In [17]: equal_area_centroid_x = A.equal_area_x.weighted(A).mean()

In [18]: centroid = gpd.points_from_xy(
    ...:     [equal_area_centroid_x],
    ...:     [equal_area_centroid_y],
    ...:     crs="+proj=cea",
    ...: ).to_crs("epsg:4326")

In [19]: centroid.x, centroid.y
Out[19]: (array([31.81818182]), array([31.94714021]))

Note that the latitude is meaningfully lower in the version we round-tripped through the equal-area projection - this accounts for the fact that distances between lines of longitude are smaller at higher latitudes so the centroid will be somewhat lower when correctly accounting for this curvature. This is of course still approximating the shape of the earth as a sphere - a more precise centroid would require a much more complex workflow.

Upvotes: 2

Related Questions