user2611761
user2611761

Reputation: 199

Locking of HDF files using h5py

I have a whole bunch of code interacting with hdf files through h5py. The code has been working for years. Recently, with a change in python environments, I am receiving this new error message.

IOError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

What is interesting is the error occurs intermittently in some places and persistently in others. In places where it is occuring routinely, I have looked at my code and confirm that there is no other h5py instance connected to the file and that the last connection was properly flushed and closed. Again this was all working fine prior to the environment change.

Heres snippets from my conda environment:

h5py 2.8.0 py27h470a237_0 conda-forge hdf4 4.2.13 0 conda-forge hdf5 1.10.1 2 conda-forge

Upvotes: 6

Views: 16060

Answers (7)

Zhang Kin
Zhang Kin

Reputation: 89

In my case, I load h5 file from dataloader. I think it may caused by mutilprocess loading at backend. While, we can set env to avoid file locking:

os.environ["HDF5_USE_FILE_LOCKING"] = "FALSE"

or

export HDF5_USE_FILE_LOCKING=FALSE

reference issue comment

Upvotes: 0

kcw78
kcw78

Reputation: 8006

All of the previous answers touch on aspects of the file lock issue, but none of them fully address the root cause and resolution. This answer provides technical context for the cause, summarizes solutions, and provides techniques to avoid problems in the future.

First, file locking is not just an h5py issue. It is implemented in the underlying HDF5 API for any file that is opened with write access. (That is why HDFView, which is Java based, can cause a file access conflict.) As noted in 1 answer, by default HDF5 has SWMR access (Single Write; Multiple Read). More about this at the end.. Anytime a file is open for write access, the API "locks" the file to prevent write access from another process. This is done to prevent file corruption.

As a result, anytime a file is opened in write mode by a process, another process cannot access the file in write mode. When this happens, you will get the "unable to lock file" error message in the original post. (Note: this does not prevent another process from opening the file in read only mode.)

There are several scenarios that can trigger a file lock:

  1. Attempting to run 2 processes with write access on the same file at the same time. Two answers mention this issue:
    • Simultaneously running another application, e.g., myapp.py
    • Running HDFView in default write access mode. (Same as above, where HDFView is the 1st application.)
  2. Attempting to open a file in write mode after a previous process exited without properly closing the file.
    • This typically happens after a crash. From my experience, this is the most common cause.
    • It can also occur if a program doesn't properly close the file on exit. (See answer about "failed to close in an obscure method".)

How to avoid file locking:

  • To avoid when using HDFView, you can set the default access mode to Read Only. Or, use the File menu and Open as... Read-Only.
  • Application code changes are required to avoid unintended file locking (either a crash or an exit without closing). The best way to do this (in Python) is to use the with/as: context manager. With the context manager, the file is auto-magically closed when the program exits (either gracefully or after an exception).

Example of with/as: :

with h5py.File('my_h5_file.hdf5','w') as h5f:
     some code that writes to your file
     end of that code block

How to reset file lock status:
Now, if all that fails, and you still can't access a file in write mode, there is an HDF5 utility to unlock the file. (I think you need a local install of HDF5 to get this utility.) The command line entry looks like this:

h5clear –-status filename.h5 (or just -s)

Write access for multiple processes:
As noted above, default HDF5 behavior is SWMR. However, parallel write access is possible (with a little extra work). h5py uses the mpi4py package to accomplish this. However this requires a HDF5 Parallel build, and h5py has to be compiled in “MPI mode”. Details are in the h5py Parallel HDF5 docs.

Upvotes: 4

Matheus Araujo
Matheus Araujo

Reputation: 5749

I had other process running that I did not realize. How did I solved my problem:

  1. Used ps aux | grep myapp.py to find the process number that was running myapp.py.
  2. Kill the process using the kill command
  3. Run again

Upvotes: 2

user2611761
user2611761

Reputation: 199

In terms of my version of this issue, it failed to close the file in an obscure method. Interesting thing is that unlocking the file in some cases just took a restart of ipython, other times took a full reboot.

Upvotes: 2

Sean
Sean

Reputation: 1193

For me, I used multiprocessing to parallelise my data processing, and the file handle is passed to the multiprocessing pool. As a result, even if I called close(), the file would not be closed until all the subprocesses spawned by the multiprocessing pool are terminated.

Remember to call join and close if you are using multiprocessing.

        pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
        task_iter = pool.imap(...) # <- file is used in pool!
        ...
        pool.close()
        pool.join()

Upvotes: 0

Florian Brucker
Florian Brucker

Reputation: 10355

Similar as with the other answers I had already opened the file, but for me it was in a separate HDF5 viewer.

Upvotes: 0

Machuck
Machuck

Reputation: 11

with h5py.File(), the same .h5 file can be open for read ("r") multiple times. But h5py doesn't support more than a single thread. You can experience bad data with multiple concurrent readers.

Upvotes: 1

Related Questions