Reputation: 15253
I'm developing a simple crawler for web pages. I've searched an found a lot of solutions for implementing multi-threaded crawlers. What is is the best way to create a thread-safe queue to contain unique URLs?
EDIT: Is there a better solution in .Net 4.5?
Upvotes: 1
Views: 2523
Reputation: 81660
Use the Task Parallel Library and use the default scheduler which uses ThreadPool.
OK, this is a minimal implementation which queues 30 URLs at a time:
public static void WebCrawl(Func<string> getNextUrlToCrawl, // returns a URL or null if no more URLs
Action<string> crawlUrl, // action to crawl the URL
int pauseInMilli // if all threads engaged, waits for n milliseconds
)
{
const int maxQueueLength = 50;
string currentUrl = null;
int queueLength = 0;
while ((currentUrl = getNextUrlToCrawl()) != null)
{
string temp = currentUrl;
if (queueLength < maxQueueLength)
{
Task.Factory.StartNew(() =>
{
Interlocked.Increment(ref queueLength);
crawlUrl(temp);
}
).ContinueWith((t) =>
{
if(t.IsFaulted)
Console.WriteLine(t.Exception.ToString());
else
Console.WriteLine("Successfully done!");
Interlocked.Decrement(ref queueLength);
}
);
}
else
{
Thread.Sleep(pauseInMilli);
}
}
}
Dummy usage:
static void Main(string[] args)
{
Random r = new Random();
int i = 0;
WebCrawl(() => (i = r.Next()) % 100 == 0 ? null : ("Some URL: " + i.ToString()),
(url) => Console.WriteLine(url),
500);
Console.Read();
}
Upvotes: 2
Reputation: 38112
ConcurrentQueue is indeed the framework's thread-safe queue implementation. But since you're likely to use it in a producer-consumer scenario, the class you're really after may be the infinitely useful BlockingCollection.
Upvotes: 2
Reputation: 24857
Look at System.Collections.Concurrent.ConcurrentQueue. If you need to wait, you could use System.Collections.Concurrent.BlockingCollection
Upvotes: 1
Reputation: 7438
I'd use System.Collections.Concurrent.ConcurrentQueue.
You can safely queue and dequeue from multiple threads.
Upvotes: 1
Reputation: 1903
Would System.Collections.Concurrent.ConcurrentQueue<T>
fit the bill?
Upvotes: 1