BananaJoe
BananaJoe

Reputation: 97

DataArray deletes Attributes in simple computation

I've noticed that if you have a xArray DatarArray and perform simple(!) calculations on it the Attributes get 'deleted'.

Example:

example            = xr.DataArray(np.array([1,2,3]), attrs={'one':1})
without_Attributes = example*3

On the other side, if you use numpy specific functions (e.g. .round(x), ..) the Attributes remain. Is there a reasonable explanation for this? And is there a way to multiply the DataArray without loosing its attributes?

Upvotes: 2

Views: 3752

Answers (3)

Markus
Markus

Reputation: 196

My suggestion is to put the calculations that should keep the attributes into a local context:

# Compute weighted time-averages keeping units and other attributes
with xr.set_options(keep_attrs=True):
    ds = (ds * weights).mean("time")

This way, there's no need to change the configuration globally, so calculations in other places will not have unexpected side effects regarding attributes.

Upvotes: 0

Abhilash Singh Chauhan
Abhilash Singh Chauhan

Reputation: 289

Suppose you have a Xarray Dataset ds like this:

xarray.Dataset

Dimensions:

time: 43830 lon: 135 lat: 129

Coordinates:

time  (time)  datetime64[ns]   1901-01-01 ... 2020-12-31
lon   (lon)   float64          66.5 66.75 67.0 ... 99.75 100.0
lat   (lat)   float64          6.5 6.75 7.0 ... 38.0 38.25 38.5

Data variables:

rf    (time, lat, lon)    float32    dask.array<chunksize=(365, 129, 135), meta=np.ndarray>

Attributes:

CDI : Climate Data Interface version 2.0.0rc1 (https://mpimet.mpg.de/cdi)
Conventions : CF-1.6

and from this dataset you want to delete the attribute named CDI

then you may use del ds.attrs['CDI'] before exporting/saving the dataset into NetCDF file.

Upvotes: -1

Michael Delgado
Michael Delgado

Reputation: 15432

From the xarray docs on "what is your approach to metadata?":

We are firm believers in the power of labeled data! In addition to dimensions and coordinates, xarray supports arbitrary metadata in the form of global (Dataset) and variable specific (DataArray) attributes (attrs).

Automatic interpretation of labels is powerful but also reduces flexibility. With xarray, we draw a firm line between labels that the library understands (dims and coords) and labels for users and user code (attrs). For example, we do not automatically interpret and enforce units or CF conventions. (An exception is serialization to and from netCDF files.)

An implication of this choice is that we do not propagate attrs through most operations unless explicitly flagged (some methods have a keep_attrs option, and there is a global flag for setting this to be always True or False). Similarly, xarray does not check for conflicts between attrs when combining arrays and datasets, unless explicitly requested with the option compat='identical'. The guiding principle is that metadata should not be allowed to get in the way.

You can set global options in xarray with xr.set_options:

In [14]: xr.set_options(keep_attrs=True)
Out[14]: <xarray.core.options.set_options at 0x133ef58e0>

Now, attributes are preserved

In [15]: example * 3
Out[15]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
    one:      1

Note that xarray does not do anything "smart" with these attributes, which is why the default behavior is to drop them in computation. For example, a simple example with units shows how setting keep_attrs=True can go off the rails:

In [17]: dist = xr.DataArray(np.array([1,2,3]), attrs={'units': 'm'})
    ...: dist
Out[17]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0
Attributes:
    units:    m

In [18]: rate = xr.DataArray(np.array([2, 2, 2]), attrs={'units': 'm/s'})
    ...: rate
Out[18]:
<xarray.DataArray (dim_0: 3)>
array([2, 2, 2])
Dimensions without coordinates: dim_0
Attributes:
    units:    m/s

In [19]: dist / rate
Out[19]:
<xarray.DataArray (dim_0: 3)>
array([0.5, 1. , 1.5])
Dimensions without coordinates: dim_0
Attributes:
    units:    m

If you want to explicitly handle units in computation with xarray, have a look at pint-xarray, which is an effort to integrate the pint project's explicit unit handling with xarray. This project is experimental and the API is not stable, but there has been considerable work lately by both the pint-xarray crew and xarray's core team to move in the same direction so I don't expect this coordination to go away.

Workaround (or maybe the best of all worlds?)

Note that since Dataset and DataArray attributes are simply dictionaries, preserving them is easy:

In [22]: result = example * 3
    ...: result.attrs.update(example.attrs)

In [23]: result
Out[23]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
    one:      1

You can even work with them independently of the DataArray or Dataset:


In [25]: ds = xr.open_dataset('my_well_documented_file.nc')

In [26]: source_attrs = ds.attrs

In [23]: result = xr.Dataset({'new_var': ds.varname * 3})

In [24]: result.attrs.update(
    ...:     # custom new attrs
    ...:     method='multiplied varname by 3',
    ...:     updated=pd.Timestamp.now(tz='US/Pacific').strftime('%c'),
    ...:     # carry forward attrs from input file
    ...:     **{source_attrs[k] for k in ['author', 'contact']},
    ...: )

So the approach I generally take is to explicitly copy over the attributes I want at the end of computation. And, if desired, you can handle units explicitly with xarray-pint and then carry forward other metadata as a dictionary.

Upvotes: 8

Related Questions