Reputation: 29
I have multiple netCDF files that eventually i want to merge. An example netCDF is as follows.
import xarray as xr
import numpy as np
import cftime
Rain_nc = xr.open_dataset('filepath.nc', decode_times=False)
print(Rain_nc)
<xarray.Dataset>
Dimensions: (land: 67209, tstep:248)
Dimensions without coordinates: land, tstep
Data variables:
lon (land) float32...
lat (land) float32...
timestp(tstep) int32...
time (tstep) int32...
Rainf (tstep, land) float32...
the dimension 'land' is a count of numbers 1 to 67209, and 'tstep' is a count from 1 to 248.
the variable 'lat' and 'lon' are latitude and longitude values with a shape of (67209,)
the variable 'time' is the time in seconds since the start of the month (netcdf is a month long)
Next ive swapped the dimensions from 'tstep' to 'time', converted it for later merging and set coords as lon, lat and time.
rain_nc = rain_nc.swap_dims({'tstep':'time'})
rain_nc = rain_nc.set_coords(['lon', 'lat', 'time'])
rain_nc['time'] = cftime.num2date(rain_nc['time'], units='seconds since 2016-01-01 00:00:00', calendar = 'standard')
rain_nc['time'] = cftime.date2num(rain_nc['time'], units='seconds since 1970-01-01 00:00:00', calendar = 'standard')
this has left me with the following Dataset:
print(rain_nc)
<xarray.Dataset>
Dimensions: (land: 67209, time: 248)
Coordinates:
lon (land)float32
lat (land)float32
* time (time)float64
Dimensions without coordinates: land
Data variables:
timestp (time)int32
Rainf (time, land)
print(rain_nc['land'])
<xarray.DataArray 'land' (land: 67209)>
array([ 0, 1, 2,..., 67206, 67207, 67208])
Coordinates:
lon (land) float32 ...
lat (land) float32 ...
Dimensions without coordinates: land
the Rainf variable i am interested in is as follows:
<xarray.DataArray 'Rainf' (time: 248, land: 67209)>
[16667832 values with dtype=float32]
Coordinates:
lon (land) float32 -179.75 -179.75 -179.75 ... 179.75 179.75
179.75
lat (land) float32 71.25 70.75 68.75 68.25 ... -16.25 -16.75
-19.25
* time (time) float64 1.452e+09 1.452e+09 ... 1.454e+09 1.454e+09
Dimensions without coordinates: land
Attributes:
title: Rainf
units: kg/m2s
long_name: Mean rainfall rate over the \nprevious 3 hours
actual_max: 0.008114143
actual_min: 0.0
Fill_value: 1e+20
From here i would like to create a netCDF with the dimensions (time, lat, lon) and the variable Rainf.
I have tried creating a new netCDF (or alter this one) but when i try to pass the Rainf variable does not work as it has a shape of (248, 67209) and needs a shape of (248, 67209, 67209). Even though the current 'land' dimension of 'Rainf' has a lat and lon coordinate. Is it possible to reshape this variable to have a time, lat, and lon dimension?
Upvotes: 0
Views: 1529
Reputation: 3542
In the end it seems that what you want is to reshape the "land"
dimensions to the ("lat", "lon")
ones.
So, you have some DataArray similar to this:
# Setting sizes and coordinates
lon_size, lat_size = 50, 80
lon, lat = [arr.flatten() for arr in np.meshgrid(range(lon_size), range(lat_size))]
land_size = lon_size * lat_size
time_size = 100
da = xr.DataArray(
dims=("time", "land"),
data=np.random.randn(time_size, land_size),
coords=dict(
time=np.arange(time_size),
lon=("land", lon),
lat=("land", lat),
)
)
which looks like this:
>>> da
<xarray.DataArray (time: 100, land: 4000)>
array([[...]])
Coordinates:
* time (time) int64 0 1 ... 98 99
lon (land) int64 0 1 ... 48 49
lat (land) int64 0 0 ... 79 79
Dimensions without coordinates: land
First, we'll use the .set_index()
method to tell xarray that the "land"
index should be represented from the "lon"
and "lat"
coordinates:
>>> da.set_index(land=("lon", "lat"))
<xarray.DataArray (time: 100, land: 4000)>
array([[...]])
Coordinates:
* time (time) int64 0 1 ... 98 99
* land (land) MultiIndex
- lon (land) int64 0 1 ... 48 49
- lat (land) int64 0 0 ... 79 79
The dimensions are still ("time", "land")
, but now "land"
is a MultiIndex
.
Note that if you try to write to NETCDF at this point you'll have the following error:
>>> da.set_index(land=("lon", "lat")).to_netcdf("data.nc")
NotImplementedError: variable 'land' is a MultiIndex, which cannot yet be serialized to netCDF files (https://github.com/pydata/xarray/issues/1077). Use reset_index() to convert MultiIndex levels into coordinate variables instead.
It tells you to use the .reset_index()
method. But that's not what you want here, because it will just go back to the original da
state.
What you want from now is to use the .unstack()
method:
>>> da.set_index(land=("lon", "lat")).unstack("land")
<xarray.DataArray (time: 100, lon: 50, lat: 80)>
array([[[...]]])
Coordinates:
* time (time) int64 0 1 ... 98 99
* lon (lon) int64 0 1 ... 48 49
* lat (lat) int64 0 1 ... 78 79
It effectively kills the "land"
dimension and gives the desired output.
Upvotes: 2