Alex
Alex

Reputation: 159

Averaging multiple xarray DataArrays results in no data, errors, or wrong answer

I am trying to average multiple Xarray DataArrays but the result I get is wrong. The data are not aligned along the time dimension, but I want to average each array with the each time step across the arrays being averaged, no matter what the time coordinate is.

One of my xarrays is the following :

Dimensions:
time: 9125, bnds: 2, lat: 160, lon: 320
Coordinates:
time   (time)    object    1975-01-01 12:00:00 ... 1999-12-...
lat    (lat)     float64   -89.14 -88.03 ... 88.03 89.14
lon    (lon)     float64   0.0 1.125 2.25 ... 357.8 358.9
height ()        float64   ...
Data variables:
time_bnds    (time, bnds)   object    ...
lat_bnds     (lat, bnds)    float64   ...
lon_bnds     (lon, bnds)    float64   ...
tas.      (time, lat, lon). float32.  ...

and my second Xarray is the following :

time.     (time)     object    2065-01-01 12:00:00 ...208912-...
lat       (lat)      float64   -89.14 -88.03 ... 88.03 89.14
lon       (lon)      float64.  0.0 1.125 2.25 ... 357.8 358.9
height.   ()         float64   ...
Data variables:
time_bnds.  (time, bnds).       object   ...
lat_bnds.   (lat, bnds)         float64. ...
lon_bnds.   (lon, bnds).        float64. ...
tas.        (time, lat, lon).   float32. ...

However, I am not really interested if the data is aligned on the time coordinate. I just wish to find the mean of the variable temperature and create a new Xarray with the mean. All my xarrays have the same 3 dimensions (time, lat,lon) with the same size (9125,160,320)

Upvotes: 0

Views: 1464

Answers (2)

Michael Delgado
Michael Delgado

Reputation: 15432

The idea behind xarray is that it pairs the features of an N-dimensional array computing model such as numpy or dask.array with the labels-based indexing of pandas. Xarray places a huge amount of importance on the concepts of dimension names and coordinate lables, and I highly recommend checking out the xarray docs on computation using coordinates and also automatic alignment before diving in any further.

As a concrete example, just as adding two pandas series with mismatched indices would not work:

In [23]: pd.Series([1, 2], index=[1, 2]) + pd.Series([3, 4], index=[3, 4])
Out[23]:
1   NaN
2   NaN
3   NaN
4   NaN
dtype: float64

you cannot add two xarray DataArrays together with mis-aligned coordinates without aligning them somehow:

In [26]: (
    ...:     xr.DataArray([1, 2], dims=['x'], coords=[[1, 2]])
    ...:     + xr.DataArray([3, 4], dims=['x'], coords=[[3, 4]])
    ...: )
Out[26]:
<xarray.DataArray (x: 0)>
array([], dtype=int64)
Coordinates:
  * x        (x) int64

So in your case, trying to do an element-wise mean across multiple arrays with similar shapes but mismatched labels along the time dimension, you have a couple options:

  1. don't use xarray

    really, what you're trying to do is to treat your DataArrays like they are numpy arrays. You know what's really great at behaving like numpy? Numpy! :) You can access the arrays underlying any DataArray using the .data attribute:

    mean = (x1['tas'].data + x2['tas'].data + x3['tas'].data) / 3
    
  2. change your time dimension to a positional index

    another option is replacing your time dim with something that is aligned across the arrays. One easy way to do this would be to drop the time dimension entirely, using da.reset_index('time'):

    mean = (
        x1['tas'].reset_index('time')
        + x2['tas'].reset_index('time')
        + x3['tas'].reset_index('time')
    ) / 3
    

Upvotes: 1

Mathi
Mathi

Reputation: 364

Not 100% sure what you want to achieve. So you'd like to take the temporal mean over all 3 xarrays, resulting in an xarray that just has the dimensions 'latitude' and 'longitude'?

Then I'd suggest concatenating the Dataarrays along the dimension 'time' using concat and simply applying the mean function:

Example:

import xarray as xr

#create some test data
#store 3 dataarrays with random data of shape (time,lat,lon) in a list
data=[]
for i in range(3):
    x=np.random.random((100,10,10))
    data.append(xr.DataArray(x,dims=('time','lat','lon')))

#concatenate along time dimension
data_concat=xr.concat(data,dim='time')
#compute mean
data_concat.mean('time')

Upvotes: 1

Related Questions