Reputation: 97
I've noticed that if you have a xArray DatarArray and perform simple(!) calculations on it the Attributes get 'deleted'.
Example:
example = xr.DataArray(np.array([1,2,3]), attrs={'one':1})
without_Attributes = example*3
On the other side, if you use numpy specific functions (e.g. .round(x), ..) the Attributes remain. Is there a reasonable explanation for this? And is there a way to multiply the DataArray without loosing its attributes?
Upvotes: 2
Views: 3752
Reputation: 196
My suggestion is to put the calculations that should keep the attributes into a local context:
# Compute weighted time-averages keeping units and other attributes
with xr.set_options(keep_attrs=True):
ds = (ds * weights).mean("time")
This way, there's no need to change the configuration globally, so calculations in other places will not have unexpected side effects regarding attributes.
Upvotes: 0
Reputation: 289
Suppose you have a Xarray Dataset ds
like this:
xarray.Dataset
Dimensions:
time: 43830 lon: 135 lat: 129
Coordinates:
time (time) datetime64[ns] 1901-01-01 ... 2020-12-31
lon (lon) float64 66.5 66.75 67.0 ... 99.75 100.0
lat (lat) float64 6.5 6.75 7.0 ... 38.0 38.25 38.5
Data variables:
rf (time, lat, lon) float32 dask.array<chunksize=(365, 129, 135), meta=np.ndarray>
Attributes:
CDI : Climate Data Interface version 2.0.0rc1 (https://mpimet.mpg.de/cdi)
Conventions : CF-1.6
and from this dataset you want to delete the attribute named CDI
then you may use del ds.attrs['CDI']
before exporting/saving the dataset into NetCDF file.
Upvotes: -1
Reputation: 15432
From the xarray docs on "what is your approach to metadata?":
We are firm believers in the power of labeled data! In addition to dimensions and coordinates, xarray supports arbitrary metadata in the form of global (
Dataset
) and variable specific (DataArray
) attributes (attrs
).Automatic interpretation of labels is powerful but also reduces flexibility. With xarray, we draw a firm line between labels that the library understands (
dims
andcoords
) and labels for users and user code (attrs
). For example, we do not automatically interpret and enforce units or CF conventions. (An exception is serialization to and from netCDF files.)An implication of this choice is that we do not propagate
attrs
through most operations unless explicitly flagged (some methods have akeep_attrs
option, and there is a global flag for setting this to be always True or False). Similarly, xarray does not check for conflicts betweenattrs
when combining arrays and datasets, unless explicitly requested with the optioncompat='identical'
. The guiding principle is that metadata should not be allowed to get in the way.
You can set global options in xarray with xr.set_options
:
In [14]: xr.set_options(keep_attrs=True)
Out[14]: <xarray.core.options.set_options at 0x133ef58e0>
Now, attributes are preserved
In [15]: example * 3
Out[15]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
one: 1
Note that xarray does not do anything "smart" with these attributes, which is why the default behavior is to drop them in computation. For example, a simple example with units shows how setting keep_attrs=True
can go off the rails:
In [17]: dist = xr.DataArray(np.array([1,2,3]), attrs={'units': 'm'})
...: dist
Out[17]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0
Attributes:
units: m
In [18]: rate = xr.DataArray(np.array([2, 2, 2]), attrs={'units': 'm/s'})
...: rate
Out[18]:
<xarray.DataArray (dim_0: 3)>
array([2, 2, 2])
Dimensions without coordinates: dim_0
Attributes:
units: m/s
In [19]: dist / rate
Out[19]:
<xarray.DataArray (dim_0: 3)>
array([0.5, 1. , 1.5])
Dimensions without coordinates: dim_0
Attributes:
units: m
If you want to explicitly handle units in computation with xarray, have a look at pint-xarray, which is an effort to integrate the pint project's explicit unit handling with xarray. This project is experimental and the API is not stable, but there has been considerable work lately by both the pint-xarray crew and xarray's core team to move in the same direction so I don't expect this coordination to go away.
Note that since Dataset
and DataArray
attributes are simply dictionaries, preserving them is easy:
In [22]: result = example * 3
...: result.attrs.update(example.attrs)
In [23]: result
Out[23]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
one: 1
You can even work with them independently of the DataArray or Dataset:
In [25]: ds = xr.open_dataset('my_well_documented_file.nc')
In [26]: source_attrs = ds.attrs
In [23]: result = xr.Dataset({'new_var': ds.varname * 3})
In [24]: result.attrs.update(
...: # custom new attrs
...: method='multiplied varname by 3',
...: updated=pd.Timestamp.now(tz='US/Pacific').strftime('%c'),
...: # carry forward attrs from input file
...: **{source_attrs[k] for k in ['author', 'contact']},
...: )
So the approach I generally take is to explicitly copy over the attributes I want at the end of computation. And, if desired, you can handle units explicitly with xarray-pint and then carry forward other metadata as a dictionary.
Upvotes: 8