Alireza Noori
Alireza Noori

Reputation: 15253

Multi-thread C# queue in .Net 4

I'm developing a simple crawler for web pages. I've searched an found a lot of solutions for implementing multi-threaded crawlers. What is is the best way to create a thread-safe queue to contain unique URLs?

EDIT: Is there a better solution in .Net 4.5?

Upvotes: 1

Views: 2523

Answers (5)

Aliostad
Aliostad

Reputation: 81660

Use the Task Parallel Library and use the default scheduler which uses ThreadPool.


OK, this is a minimal implementation which queues 30 URLs at a time:

    public static void WebCrawl(Func<string> getNextUrlToCrawl, // returns a URL or null if no more URLs 
        Action<string> crawlUrl, // action to crawl the URL 
        int pauseInMilli // if all threads engaged, waits for n milliseconds
        )
    {
        const int maxQueueLength = 50;
        string currentUrl = null;
        int queueLength = 0;

        while ((currentUrl = getNextUrlToCrawl()) != null)
        {
            string temp = currentUrl;
            if (queueLength < maxQueueLength)
            {
                Task.Factory.StartNew(() =>
                    {
                        Interlocked.Increment(ref queueLength);
                        crawlUrl(temp);
                    }
                    ).ContinueWith((t) => 
                    {
                        if(t.IsFaulted)
                            Console.WriteLine(t.Exception.ToString());
                        else
                            Console.WriteLine("Successfully done!");
                        Interlocked.Decrement(ref queueLength);
                    }
                    );
            }
            else
            {
                Thread.Sleep(pauseInMilli);
            }
        }
    }

Dummy usage:

    static void Main(string[] args)
    {
        Random r = new Random();
        int i = 0;
        WebCrawl(() => (i = r.Next()) % 100 == 0 ? null : ("Some URL: " + i.ToString()),
            (url) => Console.WriteLine(url),
            500);

        Console.Read();

    }

Upvotes: 2

Ohad Schneider
Ohad Schneider

Reputation: 38112

ConcurrentQueue is indeed the framework's thread-safe queue implementation. But since you're likely to use it in a producer-consumer scenario, the class you're really after may be the infinitely useful BlockingCollection.

Upvotes: 2

Martin James
Martin James

Reputation: 24857

Look at System.Collections.Concurrent.ConcurrentQueue. If you need to wait, you could use System.Collections.Concurrent.BlockingCollection

Upvotes: 1

flytzen
flytzen

Reputation: 7438

I'd use System.Collections.Concurrent.ConcurrentQueue.

You can safely queue and dequeue from multiple threads.

Upvotes: 1

Simon Cowen
Simon Cowen

Reputation: 1903

Would System.Collections.Concurrent.ConcurrentQueue<T> fit the bill?

Upvotes: 1

Related Questions