Reputation: 159
I am trying to average multiple Xarray DataArrays but the result I get is wrong. The data are not aligned along the time dimension, but I want to average each array with the each time step across the arrays being averaged, no matter what the time coordinate is.
One of my xarrays is the following :
Dimensions:
time: 9125, bnds: 2, lat: 160, lon: 320
Coordinates:
time (time) object 1975-01-01 12:00:00 ... 1999-12-...
lat (lat) float64 -89.14 -88.03 ... 88.03 89.14
lon (lon) float64 0.0 1.125 2.25 ... 357.8 358.9
height () float64 ...
Data variables:
time_bnds (time, bnds) object ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
tas. (time, lat, lon). float32. ...
and my second Xarray is the following :
time. (time) object 2065-01-01 12:00:00 ...208912-...
lat (lat) float64 -89.14 -88.03 ... 88.03 89.14
lon (lon) float64. 0.0 1.125 2.25 ... 357.8 358.9
height. () float64 ...
Data variables:
time_bnds. (time, bnds). object ...
lat_bnds. (lat, bnds) float64. ...
lon_bnds. (lon, bnds). float64. ...
tas. (time, lat, lon). float32. ...
However, I am not really interested if the data is aligned on the time coordinate. I just wish to find the mean of the variable temperature and create a new Xarray with the mean. All my xarrays have the same 3 dimensions (time, lat,lon)
with the same size (9125,160,320)
Upvotes: 0
Views: 1464
Reputation: 15432
The idea behind xarray is that it pairs the features of an N-dimensional array computing model such as numpy or dask.array with the labels-based indexing of pandas. Xarray places a huge amount of importance on the concepts of dimension names and coordinate lables, and I highly recommend checking out the xarray docs on computation using coordinates and also automatic alignment before diving in any further.
As a concrete example, just as adding two pandas series with mismatched indices would not work:
In [23]: pd.Series([1, 2], index=[1, 2]) + pd.Series([3, 4], index=[3, 4])
Out[23]:
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
you cannot add two xarray DataArrays together with mis-aligned coordinates without aligning them somehow:
In [26]: (
...: xr.DataArray([1, 2], dims=['x'], coords=[[1, 2]])
...: + xr.DataArray([3, 4], dims=['x'], coords=[[3, 4]])
...: )
Out[26]:
<xarray.DataArray (x: 0)>
array([], dtype=int64)
Coordinates:
* x (x) int64
So in your case, trying to do an element-wise mean across multiple arrays with similar shapes but mismatched labels along the time dimension, you have a couple options:
don't use xarray
really, what you're trying to do is to treat your DataArrays like they are numpy arrays. You know what's really great at behaving like numpy? Numpy! :) You can access the arrays underlying any DataArray using the .data
attribute:
mean = (x1['tas'].data + x2['tas'].data + x3['tas'].data) / 3
change your time dimension to a positional index
another option is replacing your time dim with something that is aligned across the arrays. One easy way to do this would be to drop the time dimension entirely, using da.reset_index('time')
:
mean = (
x1['tas'].reset_index('time')
+ x2['tas'].reset_index('time')
+ x3['tas'].reset_index('time')
) / 3
Upvotes: 1
Reputation: 364
Not 100% sure what you want to achieve. So you'd like to take the temporal mean over all 3 xarrays, resulting in an xarray that just has the dimensions 'latitude' and 'longitude'?
Then I'd suggest concatenating the Dataarrays along the dimension 'time' using concat
and simply applying the mean
function:
Example:
import xarray as xr
#create some test data
#store 3 dataarrays with random data of shape (time,lat,lon) in a list
data=[]
for i in range(3):
x=np.random.random((100,10,10))
data.append(xr.DataArray(x,dims=('time','lat','lon')))
#concatenate along time dimension
data_concat=xr.concat(data,dim='time')
#compute mean
data_concat.mean('time')
Upvotes: 1