0000011111
0000011111

Reputation: 133

Size of netcdf file is larger after attempt to reduce it with matlab netcdf package

I tried to reduce the size of a netcdf file by dividing the temporal resolution of its variables by two, using the following:

infilename = 'original_file.nc4';
outfilename = 'new_file.nc4';
%% CREATE OUTPUT NETCDF FILE
ncid_out = netcdf.create(outfilename,'NETCDF4');
%% OPEN THE INPUT NETCDF FILE
ncid_in  = netcdf.open(infilename,'NOWRITE'); % open original file in read-only mode
[ndims,nvars] = netcdf.inq(ncid_in);
%% DEFINE NEW DIMENSIONS
for d = 0 : ndims-1
    [dimname,dimlen] = netcdf.inqDim(ncid_in,d); % get dimension from input file
    if strcmp(dimname,'time')
        netcdf.defDim(ncid_out,dimname,dimlen/2); % new time dimension with half the resolution
    else netcdf.defDim(ncid_out,dimname,dimlen); % other dimensions remain unchanged
    end
end
%% DEFINE NEW VARIABLES AND ATTRIBUTES
for v = 0 : nvars-1
    [varname,~,dimids] = netcdf.inqVar(ncid_in,v);
    out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
    for attnum = 0 : natts-1
        attname = netcdf.inqAttName(ncid_in,v,attnum);
        netcdf.copyAtt(ncid_in,v,attname,ncid_out,out_varid);
    end
end
%% LEAVE DEFINE MODE AND ENTER DATA MODE
netcdf.endDef(ncid_out);
for v = 0 : nvars-1
    [varname,xtype,dimids,natts] = netcdf.inqVar(ncid_in,v);
    var = netcdf.getVar(ncid_in,v);
    out_varid = netcdf.inqVarID(ncid_out,varname);
    if ~isempty(find(dimids==netcdf.inqDimID(ncid_in,'time'),1)) % if time is one of the dimensions
        indt = knnsearch(dimids',netcdf.inqDimID(ncid_in,'time')); % find which one it is
        S = cell(1,length(dimids));
        for f = dimids
            [~,dimlen] = netcdf.inqDim(ncid_in,f); % length of the dimension
            if netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_out,f)) == dimids(indt) % if this dimension is time
                S{indt} = 1:2:dimlen; % reduce this dimension
            else S{knnsearch(dimids',netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_in,f)))} = 1:dimlen;
            end
        end
        netcdf.putVar(ncid_out,out_varid,var(S{1:end})); % assign reduced variable
    else netcdf.putVar(ncid_out,out_varid,var); % assign full variable
    end
end
%% CLOSE INPUT AND AND OUTPUT NETCDF FILES
netcdf.close(ncid_in);
netcdf.close(ncid_out);

The code runs with no error and the new file does contain variables with a time dimension half that of the original.

The size of the original file was 1.1 Go, but the size of the new file is 1.4 Go. I was expecting to produce a file with a size about half of the original one since I am halving the time resolution. I'm not sure to understand how this happened.

Could you shed light on this?

Upvotes: 0

Views: 523

Answers (1)

Rich Signell
Rich Signell

Reputation: 16445

NetCDF4 files can use deflation (lossless compression) to reduce size. Your original file was probably written with deflation, while the new one you wrote was not. You need to specify deflation using netcdf.defVarDeflate:

netcdf.defVarDeflate(ncid,varid,shuffle,deflate,deflateLevel) 

So try adding this line just after the defVar call, which will give you a deflation level of 7, with shuffle on:

out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
netcdf.defVarDeflate(ncid_out,out_varid, true, true, 7);

For more info, see: https://www.mathworks.com/help/matlab/ref/netcdf.defvardeflate.html?requestedDomain=www.mathworks.com

Upvotes: 2

Related Questions