Reputation: 364
I'm new at using Xarray (using it inside jupyter notebooks), and up to now everything has worked like a charm, except when I started to look at how much RAM is used by my functions (e.g. htop), which is confusing me (I didn't find anything on stackexchange).
I am combining monthly data to yearly means, taking into account month lengths, masking nan values and also using specific months only, which requires the use of groupby and resample. As I can see from using the memory profiler these operations temporarily take up ~15gm RAM, which as such is not a problem because I have 64gb RAM at hand. Nonetheless it seems like some memory is blocked permanently, even though I call these methods inside a function. For the function below it blocks ~4gb of memory although the resulting xarray only has a size of ~440mb (55*10**6 float 64entries), with more complex operations it blocks more memory. Explicitly using del , gc.collect() or Dataarray.close() inside the function did not change anything.
A basic function to compute a weighted yearly mean from monthly data looks like this:
import xarray as xr
test=xr.open_dataset(path)['pr']
def weighted_temporal_mean(ds):
"""
Taken from https://ncar.github.io/esds/posts/2021/yearly-averages-xarray/
Compute yearly average from monthly data taking into account month length and
masking nan values
"""
# Determine the month length
month_length = ds.time.dt.days_in_month
# Calculate the weights
wgts = month_length.groupby("time.year") / month_length.groupby("time.year").sum()
# Setup our masking for nan values
cond = ds.isnull()
ones = xr.where(cond, 0.0, 1.0)
# Calculate the numerator
obs_sum = (ds * wgts).resample(time="AS").sum(dim="time")
# Calculate the denominator
ones_out = (ones * wgts).resample(time="AS").sum(dim="time")
# Return the weighted average
return obs_sum / ones_out
wm=weighted_temporal_mean(test)
print("nbytes in MB:", wm.nbytes / (1024*1024))
Any idea how to ensure that the memory is freed up, or am I overlooking something and this behavior is actually expected?
Thank you!
Upvotes: 1
Views: 1816
Reputation: 110186
The only hypothesis I have for this behavior is that some of the operations involving the passed in ds
modify it in place, increasing its size, as, apart of the returned objects, this the the only object that should survive after the function execution.
That can be easily verified by using del
on the ds
structure used as input after the function is run. (If you need the data afterwards, re-read it, or make a deepcopy before calling the function).
If that does not resolve the problem, then this is an issue with the xarray project, and I'd advise you to open an issue in their project.
Upvotes: 2