Mathi
Mathi

Reputation: 364

Why does manipulating xarrays takes up so much memory permanently?

I'm new at using Xarray (using it inside jupyter notebooks), and up to now everything has worked like a charm, except when I started to look at how much RAM is used by my functions (e.g. htop), which is confusing me (I didn't find anything on stackexchange).

I am combining monthly data to yearly means, taking into account month lengths, masking nan values and also using specific months only, which requires the use of groupby and resample. As I can see from using the memory profiler these operations temporarily take up ~15gm RAM, which as such is not a problem because I have 64gb RAM at hand. Nonetheless it seems like some memory is blocked permanently, even though I call these methods inside a function. For the function below it blocks ~4gb of memory although the resulting xarray only has a size of ~440mb (55*10**6 float 64entries), with more complex operations it blocks more memory. Explicitly using del , gc.collect() or Dataarray.close() inside the function did not change anything.

A basic function to compute a weighted yearly mean from monthly data looks like this:

import xarray as xr
test=xr.open_dataset(path)['pr']

def weighted_temporal_mean(ds):
    """
    Taken  from https://ncar.github.io/esds/posts/2021/yearly-averages-xarray/
    Compute yearly average from monthly data taking into account month length and 
    masking nan values
    """
    # Determine the month length
    month_length = ds.time.dt.days_in_month

    # Calculate the weights
    wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()

    # Setup our masking for nan values
    cond = ds.isnull()
    ones = xr.where(cond, 0.0, 1.0)

    # Calculate the numerator
    obs_sum = (ds * wgts).resample(time="AS").sum(dim="time")

    # Calculate the denominator
    ones_out = (ones * wgts).resample(time="AS").sum(dim="time")

    # Return the weighted average
    return obs_sum / ones_out

wm=weighted_temporal_mean(test)
print("nbytes in MB:", wm.nbytes / (1024*1024))

Any idea how to ensure that the memory is freed up, or am I overlooking something and this behavior is actually expected?

Thank you!

Upvotes: 1

Views: 1816

Answers (1)

jsbueno
jsbueno

Reputation: 110186

The only hypothesis I have for this behavior is that some of the operations involving the passed in ds modify it in place, increasing its size, as, apart of the returned objects, this the the only object that should survive after the function execution.

That can be easily verified by using del on the ds structure used as input after the function is run. (If you need the data afterwards, re-read it, or make a deepcopy before calling the function).

If that does not resolve the problem, then this is an issue with the xarray project, and I'd advise you to open an issue in their project.

Upvotes: 2

Related Questions