Reputation: 463
I'm trying to read a timeseries of a single WRF output variable. The time series is distributed, one timestamp per file, across more than 5000 netCDF files. Each file contains roughly 200 variables.
Is there a way to call xarray.open_mfdataset() for only the variable I'm interested in? I can specify a single variable by providing a list to the 'data_vars' argument, but it still reads everything for the 'minimal' case. For my files the 'minimal' case includes almost everything and is thus relatively slow.
Is my best bet to create a single netCDF file containing my variable of interest with something like ncrcat, or is there a more streamlined way to do this entirely within xarray (or some other python tool)?
My netCDF files are netCDF4 (not netCDF4-classic), which seems to rule out netCDF4.MFDataset().
Upvotes: 4
Views: 3392
Reputation: 161
Another option is to define a preprocessing function that defines the variables to keep via the "preprocess" keyword argument, e.g.:
preprocess=lambda ds: ds[variablelist]
Upvotes: 2
Reputation: 9
As a follow up for the ones who will find this thread later. Based on the documentation (but a bit hidden), the "data_vars=" argument only works with Python 3.9.
Upvotes: 0
Reputation: 1115
I'm not sure why providing the data_vars=
argument still reads all data - I experienced the same issue reading WRF output. My workaround was to make a list of all the variables I didn't need (all 200+) and feed that to the drop_variables=
argument. You can get a list of all variables and then just delete or comment out the ones you want to keep.
varlist = list(ds.variables)
Upvotes: 1