Adrián
Adrián

Reputation: 85

flock() between PHP and C edge case

I have a PHP script which receives and saves invoices as files in Linux. Later, a C++ infinite loop based program reads each and does some processing. I want the latter to read each file safely (only after fully written).

PHP side code simplification:

file_put_contents("sampleDir/invoice.xml", "contents", LOCK_EX)

On the C++ side (with C filesystem API), I must first note that I want to preserve a code which deletes the files in the designated invoices folder which are empty, just as a means to properly deal with the edge case of an empty file being created from other sources (not the PHP script).

Now, here's a C++ side code simplification, too:

FILE* pInvoiceFile = fopen("sampleDir/invoice.xml", "r");

if (pInvoiceFile != NULL)
{
    if (flock(pInvoiceFile->_fileno, LOCK_SH) == 0)
    {
        struct stat fileStat;
        fstat(pInvoiceFile->_fileno, &fileStat);
        string invoice;
        invoice.resize(fileStat.st_size);

        if (fread((char*)invoice.data(), 1, fileStat.st_size, pInvoiceFile) < 1)
        {
            remove("sampleDir/invoice.xml"); // Edge case resolution
        }

        flock(pInvoiceFile->_fileno, LOCK_UN);
    }
}

fclose(pInvoiceFile);

As you can see, the summarizing key concept is the cooperation of LOCK_EX and LOCK_SH flags.

My problem is that, while this integration has been working fine, yesterday I noticed the edge case executed for an invoice which should not be empty, and thus it got deleted by the C++ program.

PHP manual on file_put_contents mentions the following for the LOCK_EX flag:

Acquire an exclusive lock on the file while proceeding to the writing. In other words, a flock() call happens between the fopen() call and the fwrite() call. This is not identical to an fopen() call with mode "x".

Upvotes: 0

Views: 96

Answers (1)

Marco Bonelli
Marco Bonelli

Reputation: 69367

Your code is assuming that the file_put_contents() operation is atomic, and that using FLOCK_EX and FLOCK_SH is enough to ensure no race conditions between the two programs happen. This is not the case.

As you can see from the PHP doc, the FLOCK_EX is applied after opening the file. This is important, because it leaves a short window of time for the C++ program to successfully open the file and lock it with FLOCK_SH. At that point the file was already truncated by the fopen() done by PHP, and it's empty.

What's most likely happening is:

  1. PHP code opens the file for writing, truncating it and effectively wiping out its content.
  2. C++ code opens the file for reading.
  3. C++ code requests the shared lock on the file: the lock is granted.
  4. PHP code requests the exclusive lock on the file: the call blocks, waiting for the lock to be available.
  5. C++ code reads the file's contents: nothing, the file is empty.
  6. C++ code deletes the file.
  7. C++ code releases the shared lock.
  8. PHP code acquires the exclusive lock.
  9. PHP code writes to the file: the data does not reach the disk because the inode associated with the open file descriptor does not exist anymore.
  10. You are effectively left with no file and the data is lost.

The problem with your code is that the operations you are doing on the file from two different programs are not atomic, and the way you are acquiring the locks does not help in ensuring that those don't overlap.

The only sane way of guaranteeing the atomicity of such an operation on a POSIX compliant system, without even worrying about file locking, is to take advantage of the atomicity of rename(2):

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing.

If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place.

The equivalent rename() PHP function is what you should use in this case. It's the simplest way to guarantee atomic updates to a file.

What I would suggest is the following:

  • PHP code:

    $tmpfname = tempnam("/tmp", "myprefix");     // Create a temporary file.
    file_put_contents($tmpfname, "contents");    // Write to the temporary file.
    rename($tmpfname, "sampleDir/invoice.xml");  // Atomically replace the contents of invoice.xml by renaming the file.
    
    // TODO: check for errors in all the above calls, most importantly tempnam().
    
  • C++ code:

    FILE* pInvoiceFile = fopen("sampleDir/invoice.xml", "r");
    
    if (pInvoiceFile != NULL)
    {
        struct stat fileStat;
        fstat(fileno(pInvoiceFile), &fileStat);
    
        string invoice;
        invoice.resize(fileStat.st_size);
    
        size_t n = fread(&invoice[0], 1, fileStat.st_size, pInvoiceFile);
        fclose(pInvoiceFile);
    
        if (n == 0)
            remove("sampleDir/invoice.xml");
    }
    

This way, the C++ program will always either see the old version of the file (if fopen() happens before PHP's rename()) or the new version of the file (if fopen() happens after), but it will never see an inconsistent version of the file.

Upvotes: 1

Related Questions