Yuhang Pan
Yuhang Pan

Reputation: 45

How to read multiple NetCDF files from MODIS which have variables in groups?

Recently I tried to read MODIS Cloud properties data. I tried to merge/ combine MOIDS NetCDF files however both ncrcat or CDO didn't work. Then I found that variables data in MODIS was collected in each group.

a='MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'

b=nc.Dataset(a)
print(b.groups.keys())
c=b.groups['Cloud_Mask_Fraction']
print(c.variables['Mean'])

then it will give results

dict_keys(['Solar_Zenith', 'Solar_Azimuth', 'Sensor_Zenith', 'Sensor_Azimuth', 'Cloud_Top_Pressure', 'Cloud_Mask_Fraction', 'Cloud_Mask_Fraction_Low', 'Cloud_Mask_Fraction_Mid', 'Cloud_Mask_Fraction_High', 'Cloud_Optical_Thickness_Liquid', 'Cloud_Optical_Thickness_Ice', 'Cloud_Optical_Thickness_Total', 'Cloud_Optical_Thickness_PCL_Total', 'Cloud_Optical_Thickness_Log10_Liquid', 'Cloud_Optical_Thickness_Log10_Ice', 'Cloud_Optical_Thickness_Log10_Total', 'Cloud_Particle_Size_Liquid', 'Cloud_Particle_Size_Ice', 'Cloud_Water_Path_Liquid', 'Cloud_Water_Path_Ice', 'Cloud_Retrieval_Fraction_Liquid', 'Cloud_Retrieval_Fraction_Ice', 'Cloud_Retrieval_Fraction_Total'])

<class 'netCDF4._netCDF4.Variable'>
float64 Mean(longitude, latitude)
    _FillValue: -999.0
    title: Cloud_Mask_Fraction: Mean
    units: none
path = /Cloud_Mask_Fraction
unlimited dimensions: 
current shape = (360, 180)
filling on

There are other variables in many groups, and I need to read all the other files or merge these files. So I am wondering how can I read multiple NetCDF files with groups? How can I have arrays for each variable with a new dimension time since I have to read these data for years? Does CDO or ncrcat or xarray in python can merge this kind of nc files?

Thanks a lot. Yuhang

Upvotes: 0

Views: 949

Answers (1)

dl.meteo
dl.meteo

Reputation: 1786

I would recommend to use xarray as the state-of-the-art 4D grid data handler in python.

You have to install netcdf4 and I recommend h5netcdf because of faster processing.

path_to_file = 'MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
# if h5netcdf is installed:
data = xarray.open_dataset(path_to_file, engine='h5netcdf') 
# if just netcdf4 is installed:
data = xarray.open_dataset(path_to_file)

# access variables:
data[<variable_name>]
data.<variable_name>

# inspect whole file:
data

You can load multiple files into one dataset:

datasets = xarray.open_datasets([path_to_file_1, path_to_file_2], parallel=True)

I expect some errors in case that you have different time spans but you can find ways to work around such an issue.

I added parallelisation here to enhance the parsing speed. Please add test data via link to a cloud storage or similar otherwise the community can not help you more like these suggestions.

PS: Please choose variable names wisely ;)

Upvotes: 1

Related Questions