Rob
Rob

Reputation: 1426

Using xarray to open a multi-file dataset when both the files and dataset have a "time" component

I'm not sure how to word this question but I hope this example can explain it.

I have a series of netcdf files per day of data. Each file contains a time dimension to the data which as a 30 day forecast.

If I read in a year's worth of data using:

data=xarray.open_mfdataset(files, concat_dim='None', autoclose='True')

Then I get:

Dimensions:   (None: 365, lat: 110, lon: 100, time: 395)

I'm only interested in the value at the time = 0 for each file, i.e. for file = 0, I want time = 0 for file = 360, I want time = 360, etc.

Basically I think what I want to do is only read in the first element of the time component from each file but I can't seem to figure out how to do that with open_mfdataset.

Even just dropping the unwanted values after reading the whole thing in would be fine but I can't seem to figure that out either because of the way open_mfdataset concatenates the dataset.

Upvotes: 1

Views: 5647

Answers (1)

jhamman
jhamman

Reputation: 6464

Using a preprocess function will allow you to do what you're after. The preprocess function is applied before concatenation so you can use that to reformat datasets during the open_mfdataset step.

def preprocess(ds):
    '''keep only the first timestep for each file'''
    return ds.isel(time=0)


data = xr.open_mfdataset(files, preprocess=preprocess, concat_dim='time', ...)

Depending on how your files are formatted, you may have to further cleanup the datasets in preprocess.

Upvotes: 5

Related Questions