Logan
Logan

Reputation: 97

HDF5 error when opening .nc files in python with xarray

I'm attempting to open MERRA-2 files using xarray, as my title suggests. The specific error I am encountering occurs when I attempt to view the values in a certain variable using a print statement. The error is as follows:

HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 5:
  #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #001: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute
    major: Attribute
    minor: Can't open object
  #004: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header
    major: Attribute
    minor: Unable to initialize object
  #005: H5Oattribute.c line 476 in H5O__attr_open_by_name(): can't open attribute
    major: Attribute
    minor: Can't open object
  #006: H5Adense.c line 394 in H5A__dense_open(): can't locate attribute in name index
    major: Attribute
    minor: Object not found

I believe that this error (warning?) is thrown for every file I attempt to open. If I wait long enough, the command I ran does actually go through, albeit much longer than it should take.

So, my code is below.

import xarray as xr

data = xr.open_mfdataset('/path/to/my/data/*.nc')
print(data.OMEGA.values)

This should print a matrix containing several submatrices (my dimensions are time, lat, lon, level), printing the value at every coordinate. Again, it does eventually do this, but not before giving me the above warning for every single file in my directory.

I've looked at plenty of stackoverflow, HDFforum, and github posts, and none of their solutions have worked/been applicable to my problem.

Upvotes: 6

Views: 4420

Answers (1)

Michael Delgado
Michael Delgado

Reputation: 15442

This error occurs due to conflicting non-python (e.g. fortran/C/C++) dependencies.

This commonly happens when you install packages using conda with conflicting channels. This happens a lot when using Anaconda. Anaconda is a nice place to start, because it gives you a pre-built bundle (or "distribution") of data science packages. However, these all come from the defaults channel, and if you later install a package from a different channel into the base environment, you can end up with nasty conflicts like this.

I'd recommend uninstalling anaconda (by deleting your anaconda directory), and installing one of the following:

  • miniconda - similar to anaconda, but with no packages installed by default.
  • miniforge - a variant of miniconda which prioritizes the conda-forge channel
  • mambaforge - mamaba is a compiled variant of conda that runs in parallel. It's a little less stable than conda and the error messages tend to be less helpful, so if you run into trouble, you can always just run the same command using conda to see what's happening - they're completely interchangeable. But mamba is much, much faster.

My top recommendation would be mambaforge - it's really fast and by default helps you avoid combining different channels by setting conda-forge as the priority channel.

When using any of these, don't install packages into your base environment, unless they are cross-environment utilities which themselves can activate your various kernel environments, like jupyterlab or an IDE like VSCode or Spyder. So, after deleting your environment, I'd recommend installing one of these, esp. mambaforge, and then re-installing into a new environment, e.g. with mamba create -n myenv python=3.10 xarray dask netCDF4 bottleneck matplotlib scipy pandas [...].

Upvotes: 6

Related Questions