Reputation: 463

xarray.open_mfdataset for a small subset of variables

I'm trying to read a timeseries of a single WRF output variable. The time series is distributed, one timestamp per file, across more than 5000 netCDF files. Each file contains roughly 200 variables.

Is there a way to call xarray.open_mfdataset() for only the variable I'm interested in? I can specify a single variable by providing a list to the 'data_vars' argument, but it still reads everything for the 'minimal' case. For my files the 'minimal' case includes almost everything and is thus relatively slow.

Is my best bet to create a single netCDF file containing my variable of interest with something like ncrcat, or is there a more streamlined way to do this entirely within xarray (or some other python tool)?

My netCDF files are netCDF4 (not netCDF4-classic), which seems to rule out netCDF4.MFDataset().

Upvotes: 4

Answers (3)

momme

Reputation: 161

Another option is to define a preprocessing function that defines the variables to keep via the "preprocess" keyword argument, e.g.:

preprocess=lambda ds: ds[variablelist]

Upvotes: 2

Nathan G.

Reputation: 9

As a follow up for the ones who will find this thread later. Based on the documentation (but a bit hidden), the "data_vars=" argument only works with Python 3.9.

Upvotes: 0

bwc

Reputation: 1115

I'm not sure why providing the data_vars= argument still reads all data - I experienced the same issue reading WRF output. My workaround was to make a list of all the variables I didn't need (all 200+) and feed that to the drop_variables= argument. You can get a list of all variables and then just delete or comment out the ones you want to keep.

varlist = list(ds.variables)

Upvotes: 1

xarray.open_mfdataset for a small subset of variables

Answers (3)

Related Questions