Reputation: 3109
I'm using xarray.open_mfdataset()
to open and combine 8 netcdf files (output from model simulations with different settings) without loading them into memory. This works great if I specify concat_dim='run_number'
, which adds run_number
as a dimension without coordinates and just fills it with values from 0 to 7.
The problem is that now, I don't know which run_number belongs to which simulation. The original netcdf's all have attributes that help me to distinguish them, e.g. identifyer=1
, identifyer=2
, etc., but this is not recognized by xarray, even if I specify concat_dim='identifyer'
(perhaps because there are many attributes?).
Is there any way in which I can tell xarray that it has to use this attribute as concat_dim
? Or alternatively, in which order does xarray read the input files, so that I can infer which value of the new dimension corresponds to which simulation?
Upvotes: 3
Views: 2325
Reputation: 9623
Xarray will use the values of existing scalar coordinates to label result coordinates, but it doesn't look at attributes. Only looking at metadata found in coordinates is a general theme in xarray: we leave attrs
to user code only. So this should work you assign scalar 'identifyer'
coordinates to each dataset, e.g., using the preprocess
argument to open_mfdataset
:
def add_id(ds):
ds.coords['identifyer'] = ds.attrs['identifyer']
return ds
xarray.open_mfdataset(path, preprocess=add_id)
Alternatively, you can either pass an explicit list of filenames to open_mfdataset
or rely on the fact that open_mfdataset
sorts the glob of filenames before combining them: the datasets will always be combined in lexicographic order of their names.
Upvotes: 5