Eugene D. Gubenkov
Eugene D. Gubenkov

Reputation: 5357

Concurrent File.Move of the same file

It was clearly stated that File.Move is atomic operation here: Atomicity of File.Move.

But the following code snippet results in visibility of moving the same file multiple times.

Does anyone know what is wrong with this code?

using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;

namespace FileMoveTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string path = "test/" + Guid.NewGuid().ToString();

            CreateFile(path, new string('a', 10 * 1024 * 1024));

            var tasks = new List<Task>();

            for (int i = 0; i < 10; i++)
            {
                var task = Task.Factory.StartNew(() =>
                {
                    try
                    {
                        string newPath = path + "." + Guid.NewGuid();

                        File.Move(path, newPath);

                        // this line does NOT solve the issue
                        if (File.Exists(newPath))
                            Console.WriteLine(string.Format("Moved {0} -> {1}", path, newPath));
                    }
                    catch (Exception e)
                    {
                        Console.WriteLine(string.Format("  {0}: {1}", e.GetType(), e.Message));
                    }
                });

                tasks.Add(task);
            }

            Task.WaitAll(tasks.ToArray());
        }

        static void CreateFile(string path, string content)
        {
            string dir = Path.GetDirectoryName(path);

            if (!Directory.Exists(dir))
            {
                Directory.CreateDirectory(dir);
            }

            using (FileStream f = new FileStream(path, FileMode.OpenOrCreate))
            {
                using (StreamWriter w = new StreamWriter(f))
                {
                    w.Write(content);
                }
            }
        }
    }
}

The paradoxical output is below. Seems that file was moved multiple times onto different locations. On the disk only one of them is present. Any thoughts?

Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.0018d317-ed7c-4732-92ac-3bb974d29017
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.3965dc15-7ef9-4f36-bdb7-94a5939b17db
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be
Moved test/eb85560d-8c13-41c1-926a-6871be030742 -> test/eb85560d-8c13-41c1-926a-6871be030742.c6de8827-aa46-48c1-b036-ad4bf79eb8a9
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.
System.IO.FileNotFoundException: Could not find file 'C:\file-move-test\test\eb85560d-8c13-41c1-926a-6871be030742'.

The resulting file is here:

eb85560d-8c13-41c1-926a-6871be030742.fb66306a-5a13-4f26-ade2-acff3fb896be

UPDATE. I can confirm that checking File.Exists also does NOT solve the issue - it can report that single file was really moved into several different locations.

SOLUTION. The solution I end up with is following: Prior to operations with source file create special "lock" file, if it succeeded then we can be sure that only this thread got exclusive access to the file and we are safe to do anything we want. The below is right set of parameters to create suck "lock" file.

File.Open(lockPath, FileMode.CreateNew, FileAccess.Write);

Upvotes: 3

Views: 2206

Answers (1)

Peter Duniho
Peter Duniho

Reputation: 70671

Does anyone know what is wrong with this code?

I guess that depends on what you mean by "wrong".

The behavior you're seeing is not IMHO unexpected, at least if you're using NTFS (other file systems may or may not behave similarly).

The documentation for the underlying OS API (MoveFile() and MoveFileEx() functions) is not specific, but in general the APIs are thread-safe, in that they guarantee the file system will not be corrupted by concurrent operations (of course, your own data could be corrupted, but it will be done in a file-system-coherent way).

Most likely what is occurring is that as the move-file operation proceeds, it does so by first getting the actual file handle from the given directory link to it (in NTFS, all "file names" that you see are actually hard links to an underlying file object). Having obtained that file handle, the API then creates a new file name for the underlying file object (i.e. as a hard link), and then deletes the previous hard link.

Of course, as this progresses, there is a window during the time between a thread having obtained the underlying file handle but before the original hard link has been deleted. This allows some but not all of the other concurrent move operations to appear to succeed. I.e. eventually the original hard link doesn't exist and further attempts to move it won't succeed.

No doubt the above is an oversimplification. File system behaviors can be complex. In particular, your stated observation is that you only wind up with a single instance of the file when all is said and done. This suggests that the API does also somehow coordinate the various operations, such that only one of the newly-created hard links survives, probably by virtue of the API actually just renaming the associated hard link after retrieving the file object handle, as opposed to creating a new one and deleting the old one (implementation detail).


At the end of the day, what's "wrong" with the code is that it is intentionally attempting to perform concurrent operations on a single file. While the file system itself will ensure that it remains coherent, it's up to your own code to ensure that such operations are coordinated so that the results are predictable and reliable.

Upvotes: 4

Related Questions