Reputation: 21
I would like to convert .mat data that a colleage sent to me, to netcdf data in order to use them for another project. I usually do not use matlab, and so do not have a lot of experience on it. I download Matlab to do the conversion and I wrote a script to open .mat data, extract data from it (pld
, release_date
, x
, y
), create a netcdf file, define dimensions, define variables and write data to variables.
However, the last step of my script is not working, and matlab error indicates:
Error using matlab.internal.imagesci.netcdflib
The number of input elements does not match the variable size.
Here is a link to the file I am trying to convert : https://drive.google.com/file/d/1Rxz1kBg5eLbJuEfHpPYPDrjUjfoacAD1/view?usp=sharing
This .mat data is a 1x1 structure with 4 fields:
pld
4492500x1 doublerelease_date
4492500x1 doublex
4492500x72 doubley
4492500x72 doubleThis is the code I am using for now:
% Define the path to the folder containing .mat files
folder_path = 'D:/.../output_test';
% Get a list of .mat files in the folder
mat_files = dir(fullfile(folder_path, '*.mat'));
% Loop through each .mat file
for i = 1:numel(mat_files)
mat_file = mat_files(i);
mat_data = load(fullfile(folder_path, mat_file.name));
% Extract data from the loaded struct
pld = mat_data.pld;
release_date = mat_data.release_date;
x = mat_data.x;
y = mat_data.y;
% Create a NetCDF file
nc_file = strrep(mat_file.name, '.mat', '.nc');
ncid = netcdf.create(nc_file, 'NETCDF4');
% Define dimensions
dim_x = netcdf.defDim(ncid, 'x', size(x, 2));
dim_y = netcdf.defDim(ncid, 'y', size(x, 2));
% Define variables
var_pld = netcdf.defVar(ncid, 'pld', 'double', [dim_x, dim_y]);
var_release_date = netcdf.defVar(ncid, 'release_date', 'double', [dim_x, dim_y]);
var_x = netcdf.defVar(ncid, 'x', 'double', [dim_x, dim_y]);
var_y = netcdf.defVar(ncid, 'y', 'double', [dim_x, dim_y]);
% Complete the definition of the NetCDF file
netcdf.endDef(ncid);
% Write data to variables
netcdf.putVar(ncid, var_pld, pld);
netcdf.putVar(ncid, var_release_date, release_date);
netcdf.putVar(ncid, var_x, x);
netcdf.putVar(ncid, var_y, y);
% Close the NetCDF file
netcdf.close(ncid);
end
Upvotes: 2
Views: 292
Reputation: 2375
First, lets clarify the MATLAB part here.
You are correct regarding the sizes of each of these matrices.
The size
function in MATLAB gives you the size in for each dimension specified.
size(x)
equals the vector [4492500, 72]
size(x, 1)
equals the number 4492500
.size(x, 2)
equals the number 72
.size(x, 72)
equals the number 1
(because it is implied that this 2D matrix is flat in all higher dimensions). This is useless.Now if all we wanted is to store this data in a standard format, one could use HDF5 and not even care about how many dimensions each dataset has (I'm just guessing the proper DataType
here based on the values in the example file).
h5_file = strrep(mat_file.name, '.mat', '.h5');
h5create(h5_file, '/pld', size(mat_data.pld), DataType='uint8');
h5create(h5_file, '/release_date', size(mat_data.release_date), DataType='int8');
h5create(h5_file, '/x', size(mat_data.x), DataType='double');
h5create(h5_file, '/y', size(mat_data.y), DataType='double');
h5write(h5_file, '/pld', mat_data.pld);
h5write(h5_file, '/release_date', mat_data.release_date);
h5write(h5_file, '/x', mat_data.x);
h5write(h5_file, '/y', mat_data.y);
h5disp(h5_file); % just to inspect
HDF5 test.h5
Group '/'
Dataset 'pld'
Size: 4492500x1
MaxSize: 4492500x1
Datatype: H5T_STD_U8LE (uint8)
ChunkSize: []
Filters: none
FillValue: 0
Dataset 'release_date'
Size: 4492500x1
MaxSize: 4492500x1
Datatype: H5T_STD_I8LE (int8)
ChunkSize: []
Filters: none
FillValue: 0
Dataset 'x'
Size: 4492500x72
MaxSize: 4492500x72
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'y'
Size: 4492500x72
MaxSize: 4492500x72
Datatype: H5T_IEEE_F64LE (double)
ChunkSize: []
Filters: none
FillValue: 0.000000
and you could of course opt to create groups and whatnot here.
If we want to use the NetCDF format, we need to understand how it is designed, and what it means by dimensions.
I don't know what this data means, you have to define what the dimension(s) that has 4492500
elements mean, and what the dimension(s) that has 72
elements mean.
It's not an x and y dimensions here. It's a question of that the i and j index would mean for x(i,j)
.
I am assuming that these all are connected given the reoccuring sizes, but it technically would be unrelated.
I'm going to guess these are many samples over time, so 4492500 is the time dimension. And then 72 gets to be the, uh.. location dimension? I really can't tell from the data what is might be since it's all NaN values.
dim_time = netcdf.defDim(ncid, 'time', size(x, 1)); % 4.5 million long
dim_location = netcdf.defDim(ncid, 'location', size(x, 2)); % 72 long
Now we are ready to define the variables
var_pld = netcdf.defVar(ncid, 'pld', 'ubyte', [dim_time]);
var_release_date = netcdf.defVar(ncid, 'release_date', 'byte', [dim_time]);
var_x = netcdf.defVar(ncid, 'x', 'double', [dim_time, dim_location]);
var_y = netcdf.defVar(ncid, 'y', 'double', [dim_time, dim_location]);
(rest of the code should work as is)
Note I only had one file to go with, if pld
and release_date
can take higher values, then you should just use int
for them instead of *int8
and *byte
.
Upvotes: 1