Reputation: 6892
I have this sample Dataset containing worldwide air temperature, and more importantly, a mask land
, marking land/non-water areas.
<xarray.Dataset>
Dimensions: (lat: 55, lon: 143, time: 5)
Coordinates:
* time (time) datetime64[ns] 2016-01-01 2016-01-02 2016-01-03 ...
* lat (lat) float64 -52.5 -50.0 -47.5 -45.0 -42.5 -40.0 -37.5 -35.0 ...
* lon (lon) float64 -177.5 -175.0 -172.5 -170.0 -167.5 -165.0 -162.5 ...
land (lat, lon) bool False False False False False False False False ...
Data variables:
airt (time, lat, lon) float64 7.952 7.61 7.389 7.267 7.124 6.989 ...
I can now mask the oceans and plot it
dry_areas = ds.where(ds.land)
dry_areas.airt.plot()
<xarray.Dataset>
Dimensions: (lat: 55, lon: 143)
Coordinates:
* lat (lat) float64 -52.5 -50.0 -47.5 -45.0 -42.5 -40.0 -37.5 -35.0 ...
* lon (lon) float64 -177.5 -175.0 -172.5 -170.0 -167.5 -165.0 -162.5 ...
land (lat, lon) bool False False False False False False False False ...
Data variables:
airt (lat, lon) float64 nan nan nan nan nan nan nan nan nan nan nan ...
How can I now get the coordinates for all non-nan values?
dry_areas.coords
gives me the bounding box and I can't get lat and lon into the (55, 143)
shape so I could apply the mask on.
The only working workaround I could find is
dry_areas.to_dataframe().dropna().reset_index()[['lat', 'lon']].values
, which does not feel very lean and clean.
I feel this is quite simply, however I am clearly not a numpy/matrix ninja.
Best solution so far
This is the shortest I could come with so far:
lon, lat = np.meshgrid(ds.coords['lon'], ds.coords['lat'])
lat_masked = ma.array(lat, mask=dry_areas.airt.fillna(False))
lon_masked = ma.array(lon, mask=dry_areas.airt.fillna(False))
land_coordinates = zip(lat_masked[lat_masked.mask].data, lon_masked[lon_masked.mask].data)
Upvotes: 5
Views: 11763
Reputation: 8490
You can use .stack
to get an array of coord pairs of the non-null values:
In [31]: da=xr.DataArray(np.arange(20).reshape(5,4))
In [33]: da_nans = da.where(da % 2 == 1)
In [34]: da_nans
Out[34]:
<xarray.DataArray (dim_0: 5, dim_1: 4)>
array([[ nan, 1., nan, 3.],
[ nan, 5., nan, 7.],
[ nan, 9., nan, 11.],
[ nan, 13., nan, 15.],
[ nan, 17., nan, 19.]])
Coordinates:
* dim_0 (dim_0) int64 0 1 2 3 4
* dim_1 (dim_1) int64 0 1 2 3
In [35]: da_stacked = da_nans.stack(x=['dim_0','dim_1'])
In [36]: da_stacked
Out[36]:
<xarray.DataArray (x: 20)>
array([ nan, 1., nan, 3., nan, 5., nan, 7., nan, 9., nan,
11., nan, 13., nan, 15., nan, 17., nan, 19.])
Coordinates:
* x (x) object (0, 0) (0, 1) (0, 2) (0, 3) (1, 0) (1, 1) (1, 2) ...
In [37]: da_stacked[da_stacked.notnull()]
Out[37]:
<xarray.DataArray (x: 10)>
array([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.])
Coordinates:
* x (x) object (0, 1) (0, 3) (1, 1) (1, 3) (2, 1) (2, 3) (3, 1) ...
Upvotes: 11