Conic
Conic

Reputation: 1198

Check if value exists in python xarray dataset

I'm cutting up xarrays into small cubes of data for a machine learning process and am trying to filter out cubes with no-data values in them.

I want to keep the memory footprint small and have assigned an unlikely value of -999 to no-data values. This is done to keep things int16 instead of requiring a larger type for nan

Question: What is the best way to check if -999 exists in an xarray.Dataset?

Here is what I have:

(dataset == -999).any()  

will yeild:

<xarray.Dataset>
Dimensions:  ()
Data variables:
    var_a      bool True
    var_b      bool True
    var_c      bool False  

after which I would likely have to select something like var_a. My code would end up looking like this:

def is_clean(dataset):
    return (dataset == -999).any().var_a is True 

Maybe I'm still fresh when it comes to Xarrays, but I can't find a nicer way to do this in the docs. What bit of structural knowledge about xarrays am I missing that keeps me from being ok with my current solution? Any hints?

Upvotes: 1

Views: 5816

Answers (1)

shoyer
shoyer

Reputation: 9593

Expressions on xarray objects generally return new xarray objects of the same type. This means (dataset.var_a == -999).any() results in a scalar xarray.DataArray object.

Like scalar NumPy arrays, scalar DataArray objects can be inboxed by calling builtin types on them like bool() or float(). This happens implicitly inside the condition of an if statement, for example. Also like NumPy arrays, you can unbox a scalar DataArray of any dtype by with the .item() method.

To check every data variable in a Dataset, you'll either need to iterate over the Dataset using dictionary like access, e.g.,

def is_clean(dataset):
    return all((v != -999).all() for v in dataset.data_vars.values())

Or you could convert the whole Dataset into a single DataArray by calling .to_array(), e.g.,

def is_clean(dataset):
    return bool(dataset.to_array() != -999).all())

To avoid excess memory usage, you might convert to an array after reducing, which is a little longer but not too bad:

def is_clean(dataset):
    return bool((dataset != -999).all().to_array().all())

Upvotes: 4

Related Questions