Reputation: 832
I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.
I tried xarray for it but it produces this memory error:
MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64
How can I regrid the file with python?
Upvotes: 2
Views: 13517
Reputation: 382
The easiest way to do this is to use operators like cdo
and nco
.
For example:
cdo remapbil,target_grid infile.nc ofile.nc
The target_grid
can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your needs. The example above is using bilinear interpolation.
Upvotes: 3
Reputation: 8085
Another way to access the cdo
functionality from within python is to make use of the Pypi cdo project:
pip install cdo
Then you can do
from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")
where target_grid is your usual list of options
There are several methods built in for the regridding:
You can use a grid descriptor file to define the area you need to interpolate to...
in the file grid.txt
gridtype=lonlat
xfirst=X (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY
For more details you can refer to my video guide on interpolation.
Upvotes: 3
Reputation: 3397
A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)
It has a built in method called to_latlon which will regrid to a specified latlon grid
In your case, you would need to do:
import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])
Upvotes: 5
Reputation: 341
Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.
Here is an example of the kind of regridding that you need:
import cf
import numpy
f = cf.example_field(2) # Use cf.read to read your own data
print('Source field:')
print(f)
# Define the output grid
lat = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))
# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')
print('\nRegridded field:')
print(g)
which produces:
Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(5) = [-75.0, ..., 75.0] degrees_north
: longitude(8) = [22.5, ..., 337.5] degrees_east
: air_pressure(1) = [850.0] hPa
Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
: latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
: longitude(4338) = [0.0, ..., 359.971] degreesE
: air_pressure(1) = [850.0] hPa
There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation
cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.
Upvotes: 5
Reputation: 127
Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.
An example of regridding might look something like this:
da_input = open_dataarray(
'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file
Doing da_input.load()
, or da_output.compute()
, for example, would cause all the data to be loaded into memory - which you want to avoid.
Upvotes: 4