Reputation: 133
I tried to reduce the size of a netcdf file by dividing the temporal resolution of its variables by two, using the following:
infilename = 'original_file.nc4';
outfilename = 'new_file.nc4';
%% CREATE OUTPUT NETCDF FILE
ncid_out = netcdf.create(outfilename,'NETCDF4');
%% OPEN THE INPUT NETCDF FILE
ncid_in = netcdf.open(infilename,'NOWRITE'); % open original file in read-only mode
[ndims,nvars] = netcdf.inq(ncid_in);
%% DEFINE NEW DIMENSIONS
for d = 0 : ndims-1
[dimname,dimlen] = netcdf.inqDim(ncid_in,d); % get dimension from input file
if strcmp(dimname,'time')
netcdf.defDim(ncid_out,dimname,dimlen/2); % new time dimension with half the resolution
else netcdf.defDim(ncid_out,dimname,dimlen); % other dimensions remain unchanged
end
end
%% DEFINE NEW VARIABLES AND ATTRIBUTES
for v = 0 : nvars-1
[varname,~,dimids] = netcdf.inqVar(ncid_in,v);
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
for attnum = 0 : natts-1
attname = netcdf.inqAttName(ncid_in,v,attnum);
netcdf.copyAtt(ncid_in,v,attname,ncid_out,out_varid);
end
end
%% LEAVE DEFINE MODE AND ENTER DATA MODE
netcdf.endDef(ncid_out);
for v = 0 : nvars-1
[varname,xtype,dimids,natts] = netcdf.inqVar(ncid_in,v);
var = netcdf.getVar(ncid_in,v);
out_varid = netcdf.inqVarID(ncid_out,varname);
if ~isempty(find(dimids==netcdf.inqDimID(ncid_in,'time'),1)) % if time is one of the dimensions
indt = knnsearch(dimids',netcdf.inqDimID(ncid_in,'time')); % find which one it is
S = cell(1,length(dimids));
for f = dimids
[~,dimlen] = netcdf.inqDim(ncid_in,f); % length of the dimension
if netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_out,f)) == dimids(indt) % if this dimension is time
S{indt} = 1:2:dimlen; % reduce this dimension
else S{knnsearch(dimids',netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_in,f)))} = 1:dimlen;
end
end
netcdf.putVar(ncid_out,out_varid,var(S{1:end})); % assign reduced variable
else netcdf.putVar(ncid_out,out_varid,var); % assign full variable
end
end
%% CLOSE INPUT AND AND OUTPUT NETCDF FILES
netcdf.close(ncid_in);
netcdf.close(ncid_out);
The code runs with no error and the new file does contain variables with a time dimension half that of the original.
The size of the original file was 1.1 Go, but the size of the new file is 1.4 Go. I was expecting to produce a file with a size about half of the original one since I am halving the time resolution. I'm not sure to understand how this happened.
Could you shed light on this?
Upvotes: 0
Views: 523
Reputation: 16445
NetCDF4 files can use deflation (lossless compression) to reduce size. Your original file was probably written with deflation, while the new one you wrote was not. You need to specify deflation using netcdf.defVarDeflate
:
netcdf.defVarDeflate(ncid,varid,shuffle,deflate,deflateLevel)
So try adding this line just after the defVar
call, which will give you a deflation level of 7, with shuffle on:
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
netcdf.defVarDeflate(ncid_out,out_varid, true, true, 7);
For more info, see: https://www.mathworks.com/help/matlab/ref/netcdf.defvardeflate.html?requestedDomain=www.mathworks.com
Upvotes: 2