Reputation: 440
I am having problems trying to create a very big netCDF file in python in a machine with 8gb of RAM.
I created a very big array with numpy.memmap in order to have this array in disk and not in ram because its size exceeds the available ram and swap space. (ram and swap = 8 gb each)
I created a variable in the nc file with
var = ncout.createVariable('data',ARRAY.dtype,\
('time','latitude','longitude',),\
chunksizes=(5000,61,720))
var[:]=ARRAY[:]
When the code reach this point It loads into the ram the ARRAY that is saved in disk and then I have memory error.
How can I save such a big files?
Thanks.
Upvotes: 2
Views: 1391
Reputation: 16347
The best way to read and write large NetCDF4 files is with Xarray, which reads and writes data in chunks automatically using Dask below the hood.
import xarray as xr
ds = xr.open_dataset('my_big_input_file.nc',
chunks={'time':5000, ,'latitude':61, ,'longitude':720})
ds.to_netcdf('my_big_output_file.nc',mode='w')
You can speed things up by using parallel computing with Dask.
Upvotes: 1
Reputation: 5843
Iterating directly over an array gives you the slices along the first dimension. Using enumerate
will give you both the slice and the index:
for ind, slice in enumerate(ARRAY):
var[ind] = slice
I'm not positive whether netCDF4-python will keep the slices around in memory, though.
Upvotes: 0