AJaramillo
AJaramillo

Reputation: 440

Creating a big netcdf file (>10Gb) with netCDF4 in python

I am having problems trying to create a very big netCDF file in python in a machine with 8gb of RAM.

I created a very big array with numpy.memmap in order to have this array in disk and not in ram because its size exceeds the available ram and swap space. (ram and swap = 8 gb each)

I created a variable in the nc file with

var = ncout.createVariable('data',ARRAY.dtype,\
                       ('time','latitude','longitude',),\
                        chunksizes=(5000,61,720))

var[:]=ARRAY[:]

When the code reach this point It loads into the ram the ARRAY that is saved in disk and then I have memory error.

How can I save such a big files?

Thanks.

Upvotes: 2

Views: 1391

Answers (2)

Rich Signell
Rich Signell

Reputation: 16347

The best way to read and write large NetCDF4 files is with Xarray, which reads and writes data in chunks automatically using Dask below the hood.

import xarray as xr
ds = xr.open_dataset('my_big_input_file.nc', 
            chunks={'time':5000, ,'latitude':61, ,'longitude':720})
ds.to_netcdf('my_big_output_file.nc',mode='w')

You can speed things up by using parallel computing with Dask.

Upvotes: 1

DopplerShift
DopplerShift

Reputation: 5843

Iterating directly over an array gives you the slices along the first dimension. Using enumerate will give you both the slice and the index:

for ind, slice in enumerate(ARRAY):
    var[ind] = slice

I'm not positive whether netCDF4-python will keep the slices around in memory, though.

Upvotes: 0

Related Questions