swhite
swhite

Reputation: 31

Is there a faster way to sum Xarray dataset variables?

This is my first ever stack exchange question, so I hope I'm doing this correctly.

I am trying to sum together a few xarray variables in a dataset. Each variable has the same dimensions. The code looks essentially like this:

def add_variables(xarray_dataset, listofvars):
    data = 0
    for var in listofvars:
        data = data + dset[var][:,-1,:] # slice of each variable
    return data 

summed_variables = add_variables(dset, ['varname1, varname2'])

However, this takes forever to run. Does anyone have a suggestion for a faster way to go about this? Thank you!

Upvotes: 3

Views: 3463

Answers (1)

astoeriko
astoeriko

Reputation: 890

You can use the to_array method to stack the variables along a new dimension (which is by default named "variable") and then take the sum over this dimension. You can select variables and slice them beforehand if necessary.

import numpy as np
import xarray as xr

# Create dummy dataset
ds = xr.Dataset(
    {var: (("x", "y", "z"), np.random.rand(5, 3, 2)) for var in "abcde"}
)

# Sum over (a slice of some of the) variables
vars_to_sum = ["a", "c", "d"]
summed_variables = ds[vars_to_sum].isel(y=-1).to_array().sum("variable")

I think that this is a lot easier than your custom function although it is not faster in my comparison1:

%timeit add_variables(ds, vars_to_sum)
464 µs ± 591 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit ds[vars_to_sum].isel(y=-1).to_array().sum("variable")
660 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

However, for this small dataset, both of them are pretty fast so the difference is not noticeable. I don't know what your dataset looks like – it would probably help if you could share some more information about the data in order to diagnose performance issues.

1 Note that I had to change your function a little bit to make it run – the name of the dataset in the function header and body were not consistent:

def add_variables(xarray_dataset, listofvars):
    data = 0
    for var in listofvars:
        # changed dset to xarray_dataset in the following line
        data = data + xarray_dataset[var][:,-1,:]
    return data 

Upvotes: 3

Related Questions