Reputation: 33
I am trying to calculate the distribution of a variable in a xarray. I can achieve what I am looking for by converting the xarray to a pandas dataframe as follows:
lon = np.linspace(0,10,11)
lat = np.linspace(0,10,11)
time = np.linspace(0,10,1000)
temperature = 3*np.random.randn(len(lat),len(lon),len(time))
ds = xr.Dataset(
data_vars=dict(
temperature=(["lat", "lon", "time"], temperature),
),
coords=dict(
lon=lon,
lat=lat,
time=time,
),
)
bin_t = np.linspace(-10,10,21)
DS = ds.to_dataframe()
DS.loc[:,'temperature_bin'] = pd.cut(DS['temperature'],bin_t,labels=(bin_t[0:-1]+bin_t[1:])*0.5)
DS_stats = DS.reset_index().groupby(['lat','lon','temperature_bin']).count()
ds_stats = DS_stats.to_xarray()
<xarray.Dataset>
Dimensions: (lat: 11, lon: 11, temperature_bin: 20)
Coordinates:
* lat (lat) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
* lon (lon) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
* temperature_bin (temperature_bin) float64 -9.5 -8.5 -7.5 ... 7.5 8.5 9.5
Data variables:
time (lat, lon, temperature_bin) int64 0 1 8 13 18 ... 9 5 3 0
temperature (lat, lon, temperature_bin) int64 0 1 8 13 18 ... 9 5 3 0
Is there a way to generate ds_stats without converting to a dataframe? I have tried to use groupby_bins but this does not preserve coordinates.
print(ds.groupby_bins('temperature',bin_t).count())
distributed.utils_perf - WARNING - full garbage collections took 21% CPU time recently (threshold: 10%)
<xarray.Dataset>
Dimensions: (temperature_bins: 20)
Coordinates:
* temperature_bins (temperature_bins) object (-10.0, -9.0] ... (9.0, 10.0]
Data variables:
temperature (temperature_bins) int64 121 315 715 1677 ... 709 300 116
Upvotes: 2
Views: 585
Reputation: 180
Using xhistogram may be helpful.
With the same definitions as you had set above,
from xhistogram import xarray as xhist
ds_stats = xhist.histogram(ds.temperature, bins=bin_t,dim=['time'])
should do the trick.
The one difference is that it returns a DataArray
, not a Dataset
, so if you want to do it for multiple variables, you'll have to do it separately for each one and then recombine, I believe.
Upvotes: 1