hlee
hlee

Reputation: 343

Masking out NaNs from multiple xarray.Datasets in Python

How to mask out NaNs from multiple xarray datasets with the same shape so that I can retain a common shape without NaNs?

import numpy as np
import pandas as pd
import xarray as xr

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df1 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df1.iloc[[2, 3, 2], :] = np.nan
ds1 = df1.to_xarray()

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df2 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df2.iloc[[1, 4, 1], :] = np.nan
ds2 = df2.to_xarray()

arrays = [np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df3 = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df3.iloc[[2, 1, 1], :] = np.nan
ds3 = df3.to_xarray()

In the above example datasets, I made NaNs for each dataset at different rows and columns. I want to mask out rows where any datasets have NaNs. Then the expected result will be dataframe without second~fifth rows from the top which looks like:

df3.iloc[[0, 5, 6, 7], :]

Although I described in terms of pd.dataframe for the convenience and visualization, I want to do this within xarray.Dataset structure. My trial was using xr.dataset.where() like ...

ds1_masked = ds1.where(ds1 != np.nan and ds2 != np.nan and ds3 != np.nan, 
drop=True)

which didn't work (a dataset without any variables was created).

Upvotes: 2

Views: 2553

Answers (1)

hlee
hlee

Reputation: 343

Here is a solution from my side:

mask = 1-(np.isnan(ds1.0.values) | np.isnan(ds2.0.values) | np.isnan(ds3.0.values))
ds1_mask_nan = ds1.where(mask, np.nan)    
ds1_mask_out = ds1_mask_nan.where(1-np.isnan(ds1_mask_nan[0]), drop=True)

Upvotes: 2

Related Questions