lsr729
lsr729

Reputation: 832

Regrid Netcdf file in python

I'm trying to regrid a NetCDF file from 0.125 degrees to 0.083-degree spatial scale. The netcdf contains 224 latitudes and 464 longitudes and it has daily data for one year.

I tried xarray for it but it produces this memory error: MemoryError: Unable to allocate 103. GiB for an array with shape (13858233841,) and data type float64

How can I regrid the file with python?

Upvotes: 2

Views: 13517

Answers (5)

Dani56
Dani56

Reputation: 382

The easiest way to do this is to use operators like cdo and nco.

For example:

cdo remapbil,target_grid infile.nc ofile.nc

The target_grid can be a descriptor file or you can use a NetCDF file with your desired grid resolution. Take note of other regridding methods that might suit your needs. The example above is using bilinear interpolation.

Upvotes: 3

ClimateUnboxed
ClimateUnboxed

Reputation: 8085

Another way to access the cdo functionality from within python is to make use of the Pypi cdo project:

pip install cdo 

Then you can do

from cdo import Cdo
cdo=Cdo()
cdo.remapbil("target_grid",input="in.nc",output="out.nc")

where target_grid is your usual list of options

  1. a nc file to use the grid from
  2. a regular grid specifier e.g. r360x180
  3. a txt file with a grid descriptor (see below)

There are several methods built in for the regridding:

  • remapbic : bicubic interpolation
  • remapbil : bilinear interpolation
  • remapnn : nearest neighbour interpolation
  • remapcon : first order conservative remapping
  • remapcon2 : 2nd order conservative remapping

You can use a grid descriptor file to define the area you need to interpolate to...

in the file grid.txt

gridtype=lonlat
xfirst=X   (here X is the longitude of the left hand point)
xinc=0.083
xsize=NX   (here put the number of points in domain)
yfirst=Y
yinc=0.083
ysize=NY

For more details you can refer to my video guide on interpolation.

Upvotes: 3

Robert Wilson
Robert Wilson

Reputation: 3397

A Python option, using CDO as a backend, is my package nctoolkit: https://nctoolkit.readthedocs.io/en/latest/, instalable via pip (https://pypi.org/project/nctoolkit/)

It has a built in method called to_latlon which will regrid to a specified latlon grid

In your case, you would need to do:

import nctoolkit as nc
data = nc.open_data(infile)
data.to_latlon(lon = [lon_min,lon_max],lat=[lat_min,lat_max], res =[0.083, 0.083])

Upvotes: 5

dhassell
dhassell

Reputation: 341

Another option is try cf-python, which can (in general) regrid larger-than-memory datasets in both spherical polar coordinates and Cartesian coordinates. It uses the ESMF regridding engine to do this, so linear, first and second-order conservative, nearest neighbour, etc. regridding methods are available.

Here is an example of the kind of regridding that you need:

import cf
import numpy

f = cf.example_field(2) # Use cf.read to read your own data

print('Source field:')
print(f)

# Define the output grid
lat = cf.DimensionCoordinate(
           data=cf.Data(numpy.arange(-90, 90.01, 0.083), 'degreesN'))
lon = cf.DimensionCoordinate(
          data=cf.Data(numpy.arange(0, 360, 0.083), 'degreesE'))

# Regrid the field
g = f.regrids({'latitude': lat, 'longitude': lon}, method='linear')

print('\nRegridded field:')
print(g)

which produces:

Source field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(5), longitude(8)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(5) = [-75.0, ..., 75.0] degrees_north
                : longitude(8) = [22.5, ..., 337.5] degrees_east
                : air_pressure(1) = [850.0] hPa

Regridded field:
Field: air_potential_temperature (ncvar%air_potential_temperature)
------------------------------------------------------------------
Data            : air_potential_temperature(time(36), latitude(2169), longitude(4338)) K
Cell methods    : area: mean
Dimension coords: time(36) = [1959-12-16 12:00:00, ..., 1962-11-16 00:00:00]
                : latitude(2169) = [-90.0, ..., 89.94399999999655] degreesN
                : longitude(4338) = [0.0, ..., 359.971] degreesE
                : air_pressure(1) = [850.0] hPa

There are plenty of options to get the destination grid from other fields, as well as defining it explicitly. More details can be found in the documentation

cf-python will infer which axes are X and Y, etc from the CF metadata attached to the dataset, but if that is missing then there are always ways to manually set it or work around it.

Upvotes: 5

Charles
Charles

Reputation: 127

Xarray uses something called 'lazy loading' to try and avoid using too much memory. Somewhere in your code, you are using a command which loads the entirety of the data into memory, which it cannot do. Instead, you should specify the calculation, then save the result directly to file. Xarray will perform the calculation a chunk at a time without loading everything into memory.

An example of regridding might look something like this:

da_input = open_dataarray(
    'input.nc') # the file the data will be loaded from
regrid_axis = np.arange(-90, 90, 0.125) # new coordinates
da_output = da_input.interp(lat=regrid_axis) # specify calculation
da_ouput.to_netcdf('output.nc') # save direct to file

Doing da_input.load(), or da_output.compute(), for example, would cause all the data to be loaded into memory - which you want to avoid.

Upvotes: 4

Related Questions