yegor256
yegor256

Reputation: 105053

How to check if the file is still locked by current thread?

Here is the Ruby code:

File.open('a.txt', File::CREAT | File::RDWR) do |f|
  # Another thread deletes the a.txt file here
  f.flock(File::LOCK_EX | File::LOCK_NB)
  # How do I check that the file is really locked by my thread?
end

In a multi-threaded environment, when many of them are trying to lock the file and then remove it afterward one thread may delete it right before the flock() call. In such a case, flock() still thinks that the file is in place and returns true.

I'm trying to find a way to check whether the file is really locked by the current thread right after flock() finishes. How can I do that?

Upvotes: 2

Views: 1305

Answers (1)

Vitaliy Tsvayer
Vitaliy Tsvayer

Reputation: 773

If f.flock(File::LOCK_EX | File::LOCK_NB) returns non false value then f IS locked. It will keep the lock until you close the file or explicitly call f.flock(File::LOCK_UN). You don't have to check whether it is locked again. To explain what really happens there we need to look into a file system internals and related system calls first:

 File Descriptor Table       Open File Table        i-node Table      Directory Index
╒════════════════════╕       ╒═════════════╕       ╒════════════╕     ╒═════════════╕
┃3 ..................┣━━━━━━▷┃ open file1  ┣━━┳━━━▷┃ /tmp/file1 ┃◃━━━━┫ file1       ┃
┃4 ..................┣━━━━━━▷┃ open file1  ┣━━┚ ┏━▷┃ /tmp/file2 ┃◃━━━━┫ file2       ┃
┃5 ..................┣━━━┳━━▷┃ open file2  ┣━━━━┚                   
┃6 ..................┣━━━┚

The key point in this diagram is that there are two different and unrelated entry points into the i-node Table: Open File Table and Directory Index. Different system calls work with different entry points:

  • open(file_path) => finds i-node number from Directory Index and creates an entry in Open File Table referenced by File Descriptor Table (one table per process), then increments ref_counter in the related i-node Table entry.
  • close(file_descriptor) => closes (frees) related File Descriptor Table entry and related entry from Open File Table (unless there are other referencing File Descriptors), then decrements ref_counter in related i-node Table entry (unless Open File entry stays open)
  • unlink(file_path) => there is no Delete system call! Unlinks i-node Table from Directory Index by removing entry from Directory Index. Decrements counter in the related i-node Table entry (unaware of Open File Table!)
  • flock(file_desriptor) => apply/remove lock on entries in Open File Table (unaware of Directory Index!)
  • i-node Table entry is removed (practically deleting a file) IFF ref_counter becomes Zero. It can happen after close() or after unlink()

The key point here is that unlink not necessarily deletes a file(the data) immediately! It only unlinks Directory Index and i-node Table. It means that even after unlink the file may still be open with active locks on it!

Keeping that in mind, imagine the following scenario with 2 threads, trying to synchronise on a file using open/flock/close and trying to cleanup using unlink:

   THREAD 1                              THREAD 2
==================================================
       |                                    |
       |                                    |
(1) OPEN (file1, CREATE)                    |
       |                             (1) OPEN (file1, CREATE)
       |                                    |
(2) LOCK-EX (FD1->i-node-1)                 |
  [start work]                       (2) LOCK-EX (FD2->i-node-1) <---
       |                                    .                       |
       |                                    .                       |
(3)  work                                   .                       |
       |                             (3) waiting loop               |
       |                                    .                       |
   [end work]                               .                       |
(4) UNLINK (file1)                          . -----------------------
(5) CLOSE (FD1)--------unlocked------> [start work]
       |                                    |
       |                                    |
(6) OPEN (file1, CREATE)                    |
       |                                    |
       |                             (5)  work
(7) LOCK-EX (FD1->i-node-2)                 |
  [start work] !!! does not wait            |
       |                                    |
(8)  work                                   |
       |                                    |
  • (1) both threads open(potentially create) the same file. As a result there is a link from Directory Index to i-node Table. Each thread gets its own File Descriptor.
  • (2) both threads try to get an exclusive lock using File Descriptor they get from an open call
  • (3) first thread gets a lock and second thread is blocked (or is trying to get a lock in a loop)
  • (4) first thread finishes a task and deletes (unlink) a file. At this point link from Directory Index to i-node is removed and we won't see it in the directory listing. BUT, the file is still there and is open in two threads with an active lock! It simply lost its name.
  • (5) first thread closes File Descriptor and as a result releases a lock. Thus second thread gets a lock and starts working on a task
  • (6) first thread repeats and tries to open a file with the same name. But is it the same file as before? No. Because at this point there is no file with a given name in Directory Index. So it creates a NEW file instead! new i-node Table entry.
  • (7) first thread gets a lock on a NEW file!
  • (8) and we get two threads with a lock on two different files and UNsynchronised

The problem in the above scenario is that open/unlink work on Directory Index, while lock/close work on File Descriptors, which are not related to each other.

To solve this issue we need to synchronise these operations through some central entry point. It can be implemented by introducing a singleton service which will provide this synchronisation using a Mutex or primitives from Concurrent Ruby.

Here is one possible PoC implementation:

class FS
  include Singleton

  def initialize
    @mutex = Mutex.new
    @files = {}
  end

  def open(path)
    path = File.absolute_path(path)
    file = nil
    @mutex.synchronize do
      file = File.open(path, File::CREAT | File::RDWR)
      ref_count = @files[path] || 0
      @files[path] = ref_count + 1
    end

    yield file
  ensure
    @mutex.synchronize do
      file.close
      ref_count = @files[path] - 1
      if ref_count.zero?
        FileUtils.rm(path, force: true)
        @files.delete(path)
      else
        @files[path] = ref_count
      end
    end
  end
end

And here is your re-written example from the question:

FS.instance.open('a.txt') do |f|
  if f.flock(File::LOCK_EX | File::LOCK_NB)
    # you can be sure that you have a lock
  end
  # 'a.txt' will finally be deleted
end

Upvotes: 3

Related Questions