Reputation: 21
I am using xarray
in pyhton (Spyder) to read large NetCDF-files and process them.
import xarray as xr
ds = xr.open_dataset('my_file.nc')
ds
has the following dimensions and variables:
<xarray.Dataset>
Dimensions: (time: 62215, points: 2195)
Coordinates:
* time (time) datetime64[ns] 1980-04-01 ... 2021-09-30T21:00:00
Dimensions without coordinates: points
Data variables:
longitude (time, points) float32 ...
latitude (time, points) float32 ...
hs (time, points) float32 ...
I want to calculate the 95th percentile of the variable hs
for each specific point, and generate a new variable to the dataset:
hs_95 (points) float32
I do this with one line of code:
ds['hs_95'] = ds.hs.quantile(0.95, dim='time')
Where ds.hs
is a xr.DataArray
.
But it takes a very long time to run. Is there anything I can do to make it run faster? Is xarray
the most convenient to use for this application?
Upvotes: 2
Views: 1070
Reputation: 15452
Migrating my comment into an answer...
xarray loads data from netCDFs lazily, only reading in the parts of the data which are requested for an operation. So the first time you work with the data, you'll be getting the read time + the quantile time. The quantiling may still be slow, but for a real benchmark you should first load the dataset with xr.Dataset.load()
, e.g.:
ds = ds.load()
or alternatively, you can load the data and close the file object together with xr.load_dataset(filpath)
.
That said, you should definitely heed @tekiz's great advice to use skipna=False
if you can - the performance improvement can be on the order of 100x if you don't have to skip NaNs when quantiling (if you're sure you don't have NaNs).
Upvotes: 1
Reputation: 53
Can you try skipna=False
in xarray.DataArray.quantile()
method? This could help a bit.
Upvotes: 1