Reputation: 303
I am using xarray/rasterio to do some operations on a set of GeoTiff files. The data that I am dealing with contain missing values (-9999) at some grid points. Is there any efficient pythonic way to replace these values with interpolated data using xarray?
For example I have something like:
>>> da = xr.DataArray([[1.0,2.0,3.0],[4.0,-999.0,6.0],[7.0,-999.0,9.0]],[('x',[1,2,3]),('y',[1,2,3])])
>>> da
<xarray.DataArray (x: 3, y: 3)>
array([[ 1., 2., 3.],
[ 4., -9999., 6.],
[ 7., -9999., 9.]])
Coordinates:
* x (x) int64 1 2 3
* y (y) int64 1 2 3
and I want to replace -9999 values with the interpolated values based on adjacent grid points. Any hint would be appreciated!
Upvotes: 3
Views: 3963
Reputation: 17
I still do not think there is an efficient way to do this in xarray. The suggestion by @bwc will not work properly for multidimensional arrays, because the first pass along the first dimension will already fill all gaps. This may be acceptable if your missing values are just single datapoints, but it will result in unwanted behaviour if you have larger gaps.
The simplest solution I'm aware of for your problem is using the setmisstonn
or setmisstodis
operator in CDO
(depending on how you want to interpolate).
Upvotes: 0
Reputation: 1115
You can use da.interpolate_na()
to do one dimensional interpolation of NA values - I am unsure if there is a two dimensional method.
You can convert your -999 values to NaN's and then interpolate.
da = xr.DataArray([[1.0,2.0,3.0],[4.0,-999.0,6.0],[7.0,-999.0,9.0]],[('x',[1,2,3]),('y',[1,2,3])])
da
<xarray.DataArray (x: 3, y: 3)>
array([[ 1., 2., 3.],
[ 4., -999., 6.],
[ 7., -999., 9.]])
Coordinates:
* x (x) int64 1 2 3
* y (y) int64 1 2 3
new_da = da.where(da != -999.0, np.nan)
new_da
<xarray.DataArray (x: 3, y: 3)>
array([[ 1., 2., 3.],
[ 4., nan, 6.],
[ 7., nan, 9.]])
Coordinates:
* x (x) int64 1 2 3
* y (y) int64 1 2 3
new_da.interpolate_na(dim=('y'), method='linear')
<xarray.DataArray (x: 3, y: 3)>
array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
Coordinates:
* x (x) int64 1 2 3
* y (y) int64 1 2 3
Since it's a 1-D interpolation, you can just interpolate for each dimension separately.
Upvotes: 1