Reputation: 1198
I'm cutting up xarrays into small cubes of data for a machine learning process and am trying to filter out cubes with no-data values in them.
I want to keep the memory footprint small and have assigned an unlikely value of -999 to no-data values. This is done to keep things int16
instead of requiring a larger type for nan
Question: What is the best way to check if -999 exists in an xarray.Dataset?
Here is what I have:
(dataset == -999).any()
will yeild:
<xarray.Dataset>
Dimensions: ()
Data variables:
var_a bool True
var_b bool True
var_c bool False
after which I would likely have to select something like var_a
. My code would end up looking like this:
def is_clean(dataset):
return (dataset == -999).any().var_a is True
Maybe I'm still fresh when it comes to Xarrays, but I can't find a nicer way to do this in the docs. What bit of structural knowledge about xarrays am I missing that keeps me from being ok with my current solution? Any hints?
Upvotes: 1
Views: 5816
Reputation: 9593
Expressions on xarray objects generally return new xarray objects of the same type. This means (dataset.var_a == -999).any()
results in a scalar xarray.DataArray
object.
Like scalar NumPy arrays, scalar DataArray objects can be inboxed by calling builtin types on them like bool()
or float()
. This happens implicitly inside the condition of an if
statement, for example. Also like NumPy arrays, you can unbox a scalar DataArray of any dtype by with the .item()
method.
To check every data variable in a Dataset, you'll either need to iterate over the Dataset using dictionary like access, e.g.,
def is_clean(dataset):
return all((v != -999).all() for v in dataset.data_vars.values())
Or you could convert the whole Dataset into a single DataArray by calling .to_array()
, e.g.,
def is_clean(dataset):
return bool(dataset.to_array() != -999).all())
To avoid excess memory usage, you might convert to an array after reducing, which is a little longer but not too bad:
def is_clean(dataset):
return bool((dataset != -999).all().to_array().all())
Upvotes: 4