user1884330
user1884330

Reputation: 115

Writing to a file from multiple threads without lock

I need to write data buffer by buffer in to a file from different threads. To avoid locking I am writing into different files, say 'file_1','file_2' and at last merges all these files to 'file'. Is this approach good? Is there any better suggestion?

Some files are very huge and contains thousands of buffers. Hence thousands of temp files are created and later merged and cleaned.

Upvotes: 3

Views: 3539

Answers (2)

Matthew Watson
Matthew Watson

Reputation: 109537

Here's a sample approach (with no error handling!) showing how to use a BlockingCollection to manage a queue of buffers to write to a file.

The idea is you create a ParallelFileWriter and then use it in all the threads that want to write to a file. When you're done, just dispose it (but make sure you don't dispose it until all threads have finished writing to it!).

This is just a simple example to get you started - you'd need to add argument checking and error handling:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace Demo
{
    public sealed class ParallelFileWriter: IDisposable
    {
        // maxQueueSize is the maximum number of buffers you want in the queue at once.
        // If this value is reached, any threads calling Write() will block until there's
        // room in the queue.

        public ParallelFileWriter(string filename, int maxQueueSize)
        {
            _stream     = new FileStream(filename, FileMode.Create);
            _queue      = new BlockingCollection<byte[]>(maxQueueSize);
            _writerTask = Task.Run(() => writerTask());
        }

        public void Write(byte[] data)
        {
            _queue.Add(data);
        }

        public void Dispose()
        {            
            _queue.CompleteAdding();
            _writerTask.Wait();
            _stream.Close();
        }

        private void writerTask()
        {
            foreach (var data in _queue.GetConsumingEnumerable())
            {
                Debug.WriteLine("Queue size = {0}", _queue.Count);
                _stream.Write(data, 0, data.Length);
            }
        }

        private readonly Task _writerTask;
        private readonly BlockingCollection<byte[]> _queue;
        private readonly FileStream _stream;
    }

    class Program
    {
        private void run()
        {
            // For demo purposes, cancel after a couple of seconds.

            using (var fileWriter = new ParallelFileWriter(@"C:\TEST\TEST.DATA", 100))
            using (var cancellationSource = new CancellationTokenSource(2000))
            {
                const int NUM_THREADS = 8;
                Action[] actions = new Action[NUM_THREADS];

                for (int i = 0; i < NUM_THREADS; ++i)
                {
                    int id = i;
                    actions[i] = () => writer(cancellationSource.Token, fileWriter, id);
                }

                Parallel.Invoke(actions);
            }
        }

        private void writer(CancellationToken cancellation, ParallelFileWriter fileWriter, int id)
        {
            int index = 0;

            while (!cancellation.IsCancellationRequested)
            {
                string text = string.Format("{0}:{1}\n", id, index++);
                byte[] data = Encoding.UTF8.GetBytes(text);
                fileWriter.Write(data);
            }
        }

        static void Main(string[] args)
        {
            new Program().run();
        }
    }
}

Upvotes: 5

djna
djna

Reputation: 55897

My instinct is that massaging files is going to be expensive, and managing thousands of files sounds complex and error prone.

How about instead having a dedicated thread doing the writing. Other threads simply add their message to a queue waiting to be written. Although there would be some synchronisation overhead, the actual work done in the lock is very small, just copy a "pointer" to a message to a queue. As opening files and writing to them may be more expensive than taking a mutex you may actually improve performance.

Upvotes: 9

Related Questions