Reputation: 491
I have a working python program that reads in a number of large netCDF files using the Dataset command from the netCDF4 module. Here is a snippet of the relevant parts:
from netCDF4 import Dataset
import glob
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*')):
ncin = Dataset(infile,'r')
ncin.close()
I want to modify this to read in netCDF files that are gzipped. The files themselves were gzipped after creation; they are not internally compressed (i.e., the files are *.nc.gz). If I were reading in gzipped text files, the command would be:
from netCDF4 import Dataset
import glob
import gzip
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
f = gzip.open(infile, 'rb')
file_content = f.read()
f.close()
After googling around for maybe half an hour and reading through the netCDF4 documentation, the only way I can come up with to do this for netCDF files is:
from netCDF4 import Dataset
import glob
import os
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
os.system('gzip -d ' + infile)
ncin = Dataset(infile[:-3],'r')
ncin.close()
os.system('gzip ' + infile[:-3])
Is it possible to read gzip files with the Dataset command directly? Or without otherwise calling gzip through os?
Upvotes: 9
Views: 7554
Reputation: 1350
Reading datasets from memory is supported since netCDF4-1.2.8 (Changelog):
import netCDF4
import gzip
with gzip.open('test.nc.gz') as gz:
with netCDF4.Dataset('dummy', mode='r', memory=gz.read()) as nc:
print(nc.variables)
See the description of the memory
parameter in the Dataset
documentation.
Alternative solution using xarray:
import xarray as xr
with gzip.open("test.nc.gz") as fp:
ds = xr.open_dataset(fp)
Upvotes: 10
Reputation: 3920
Since I just had to solve the same problem, here is a ready-made solution:
import gzip
import os
import shutil
import tempfile
import netCDF4
def open_netcdf(fname):
if fname.endswith(".gz"):
infile = gzip.open(fname, 'rb')
tmp = tempfile.NamedTemporaryFile(delete=False)
shutil.copyfileobj(infile, tmp)
infile.close()
tmp.close()
data = netCDF4.Dataset(tmp.name)
os.unlink(tmp.name)
else:
data = netCDF4.Dataset(fname)
return data
Upvotes: 4
Reputation: 5843
Because NetCDF4-Python wraps the C NetCDF4 library, you're out of luck as far as using the gzip module to pass in a file-like object. The only option is, as suggested by @tdelaney, to use the gzip to extract to a temporary file.
If you happen to have any control over the creation of these files, NetCDF version 4 files support zlib compression internally, so that using gzip is superfluous. It might also be worth converting the files from version 3 to version 4 if you need to repeatedly process these files.
Upvotes: 4