Charlotte Sève
Charlotte Sève

Reputation: 21

How to convert .mat to .nc using Matlab?

I would like to convert .mat data that a colleage sent to me, to netcdf data in order to use them for another project. I usually do not use matlab, and so do not have a lot of experience on it. I download Matlab to do the conversion and I wrote a script to open .mat data, extract data from it (pld, release_date, x, y), create a netcdf file, define dimensions, define variables and write data to variables.

However, the last step of my script is not working, and matlab error indicates:

Error using matlab.internal.imagesci.netcdflib
The number of input elements does not match the variable size.

Here is a link to the file I am trying to convert : https://drive.google.com/file/d/1Rxz1kBg5eLbJuEfHpPYPDrjUjfoacAD1/view?usp=sharing

This .mat data is a 1x1 structure with 4 fields:

This is the code I am using for now:

% Define the path to the folder containing .mat files
folder_path = 'D:/.../output_test';

% Get a list of .mat files in the folder
mat_files = dir(fullfile(folder_path, '*.mat'));


% Loop through each .mat file
for i = 1:numel(mat_files)
    mat_file = mat_files(i);
    mat_data = load(fullfile(folder_path, mat_file.name));
    
    % Extract data from the loaded struct
    pld = mat_data.pld;
    release_date = mat_data.release_date;
    x = mat_data.x;
    y = mat_data.y;
    
    % Create a NetCDF file
    nc_file = strrep(mat_file.name, '.mat', '.nc');
    ncid = netcdf.create(nc_file, 'NETCDF4');
    
    % Define dimensions
    dim_x = netcdf.defDim(ncid, 'x', size(x, 2));  
    dim_y = netcdf.defDim(ncid, 'y', size(x, 2));  
    
    % Define variables
    var_pld = netcdf.defVar(ncid, 'pld', 'double', [dim_x, dim_y]);
    var_release_date = netcdf.defVar(ncid, 'release_date', 'double', [dim_x, dim_y]);
    var_x = netcdf.defVar(ncid, 'x', 'double', [dim_x, dim_y]);
    var_y = netcdf.defVar(ncid, 'y', 'double', [dim_x, dim_y]);
    
    % Complete the definition of the NetCDF file
    netcdf.endDef(ncid);
    
    % Write data to variables
    netcdf.putVar(ncid, var_pld, pld);
    netcdf.putVar(ncid, var_release_date, release_date);
    netcdf.putVar(ncid, var_x, x);
    netcdf.putVar(ncid, var_y, y);
    
    % Close the NetCDF file
    netcdf.close(ncid);
end

Upvotes: 2

Views: 292

Answers (1)

Mikael Öhman
Mikael Öhman

Reputation: 2375

First, lets clarify the MATLAB part here. You are correct regarding the sizes of each of these matrices. The size function in MATLAB gives you the size in for each dimension specified.

  • size(x) equals the vector [4492500, 72]
  • size(x, 1) equals the number 4492500.
  • size(x, 2) equals the number 72.
  • size(x, 72) equals the number 1 (because it is implied that this 2D matrix is flat in all higher dimensions). This is useless.

Now if all we wanted is to store this data in a standard format, one could use HDF5 and not even care about how many dimensions each dataset has (I'm just guessing the proper DataType here based on the values in the example file).

h5_file = strrep(mat_file.name, '.mat', '.h5');

h5create(h5_file, '/pld', size(mat_data.pld), DataType='uint8');
h5create(h5_file, '/release_date', size(mat_data.release_date), DataType='int8');
h5create(h5_file, '/x', size(mat_data.x), DataType='double');
h5create(h5_file, '/y', size(mat_data.y), DataType='double');

h5write(h5_file, '/pld', mat_data.pld);
h5write(h5_file, '/release_date', mat_data.release_date);
h5write(h5_file, '/x', mat_data.x);
h5write(h5_file, '/y', mat_data.y);
h5disp(h5_file); % just to inspect
HDF5 test.h5 
Group '/' 
    Dataset 'pld' 
        Size:  4492500x1
        MaxSize:  4492500x1
        Datatype:   H5T_STD_U8LE (uint8)
        ChunkSize:  []
        Filters:  none
        FillValue:  0
    Dataset 'release_date' 
        Size:  4492500x1
        MaxSize:  4492500x1
        Datatype:   H5T_STD_I8LE (int8)
        ChunkSize:  []
        Filters:  none
        FillValue:  0
    Dataset 'x' 
        Size:  4492500x72
        MaxSize:  4492500x72
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  []
        Filters:  none
        FillValue:  0.000000
    Dataset 'y' 
        Size:  4492500x72
        MaxSize:  4492500x72
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  []
        Filters:  none
        FillValue:  0.000000

and you could of course opt to create groups and whatnot here.


If we want to use the NetCDF format, we need to understand how it is designed, and what it means by dimensions.

I don't know what this data means, you have to define what the dimension(s) that has 4492500 elements mean, and what the dimension(s) that has 72 elements mean. It's not an x and y dimensions here. It's a question of that the i and j index would mean for x(i,j). I am assuming that these all are connected given the reoccuring sizes, but it technically would be unrelated.

I'm going to guess these are many samples over time, so 4492500 is the time dimension. And then 72 gets to be the, uh.. location dimension? I really can't tell from the data what is might be since it's all NaN values.

dim_time = netcdf.defDim(ncid, 'time', size(x, 1));  % 4.5 million long
dim_location = netcdf.defDim(ncid, 'location', size(x, 2));  % 72 long

Now we are ready to define the variables

var_pld = netcdf.defVar(ncid, 'pld', 'ubyte', [dim_time]);
var_release_date = netcdf.defVar(ncid, 'release_date', 'byte', [dim_time]);
var_x = netcdf.defVar(ncid, 'x', 'double', [dim_time, dim_location]);
var_y = netcdf.defVar(ncid, 'y', 'double', [dim_time, dim_location]);

(rest of the code should work as is)

Note I only had one file to go with, if pld and release_date can take higher values, then you should just use int for them instead of *int8 and *byte.

Upvotes: 1

Related Questions