Reputation: 14886
I am trying to understand threading better and I have run into something that is confusing me. As far as I know Task.Run() starts the task in another thread.
I built some code below to test it out to see how it behaves but there is a hole in my understanding.
I imagined that I could start tasks in a loop like this:
public void DoTheThings(List<string> inputList)
{
List<Task> taskList = new List<Task>();
foreach (var input in inputList)
{
taskList.Add(Task.Run(() => this.GetTheStuff(input)));
}
Task.WaitAll(taskList.ToArray());
Console.WriteLine("Queue completed");
}
And if the called task (GetTheStuff()) had a delay in it then this would lock that thread so the next started task would be in a new thread:
public async Task GetTheStuff(string input)
{
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "starting");
int delay = GetRandomNumber(1000, 5000); // simulate time of a http request or something similar
var notInUse = input; // in real app this would be some useful assignment
await Task.Delay(delay);
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "ending");
}
But this doesn't happen. The same threads are used to start multiple tasks. Or so it seems by looking at the "ManagedThreadID" at the start and end of function.
In my erroneius assumption I thought that the Main() function would be a thread. It would launch a new thread for DoTheThings() and then this function would launch multiple threads for concurrent GetTheStuff() processing.
What is actually happening?
Complete code:
class Program
{
private static void Main(string[] args)
{
// build list of 100 random strings to represent input
List<string> thingsToProcess = new List<string>();
for (int i = 0; i < 100; i++)
{
thingsToProcess.Add(Path.GetRandomFileName());
}
Console.WriteLine("Starting queue");
var m = new MethodStuff();
var mainTask = Task.Run(() => m.DoTheThings(thingsToProcess));
Task.WaitAll(mainTask);
Console.WriteLine("All done");
Console.ReadLine();
}
}
class MethodStuff
{
private static readonly Random getrandom = new Random();
private static readonly object syncLock = new object();
public static int GetRandomNumber(int min, int max)
{
lock (syncLock)
{ // synchronize
return getrandom.Next(min, max);
}
}
// loop over all input and start each input in its own thread
public void DoTheThings(List<string> inputList)
{
List<Task> taskList = new List<Task>();
foreach (var input in inputList)
{
taskList.Add(Task.Run(() => this.GetTheStuff(input)));
}
Task.WaitAll(taskList.ToArray());
Console.WriteLine("Queue completed");
}
public async Task GetTheStuff(string input)
{
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "starting");
int delay = GetRandomNumber(1000, 5000); // simulate time of a http request or something similar
var notInUse = input; // in real app this would be some useful assignment
await Task.Delay(delay);
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "ending");
}
}
Upvotes: 1
Views: 235
Reputation: 120480
To answer your specific question:
I believe that you're misunderstanding how the async/await keywords work.
I've commented your method:
public async Task GetTheStuff(string input)
{
//this will always be the thread from
//which this method was called
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "starting");
int delay = GetRandomNumber(1000, 5000);
var notInUse = input;
//runs up to the await synchronously
await Task.Delay(delay);
//might be a different thread, depending on context...
Console.WriteLine("Thread " + Thread.CurrentThread.ManagedThreadId + "ending");
}
The take home lesson here is that until you hit the first await
, your method runs synchronously (i.e. the same thread as where it was called from). After the method resumes (i.e. the await
completes), it might now be running on a different thread, depending on the context in which it is being used.
Upvotes: 1
Reputation: 120480
You should really ask a question about the problem you're actually trying to solve :)
From what I can gather, you're probably making synchronous HTTP requests and attempting to parallelize them by firing them off in Task.Run
. This will queue them to the Threadpool, which initially, probably only contains as many threads as there are vcores on your machine. Assuming your HTTP requests are made synchronously, this will tie up the pool thread that the request is running on until the request completes. When you reach the same number of tasks as there are threads in the pool, the queue will then pause until either ThreadPool pool task completes or the ThreadPool decides to fire up another thread. The ThreadPool doesn't fire up threads in any sort of rush, so this can introduce all kinds of latency to the equation.
An excellent rule of thumb for getting high throughput is to never place blocking workloads into the ThreadPool. Synchronous HTTP is a blocking workload.
You should switch the async web requests, ideally using Task based asynchrony with async/await keywords. Done correctly, you'll be able to fire off thousands of requests without the ThreadPool even breaking a sweat (although your networking equipment may start to grumble... SOHO routers are pretty bad for this kind of thing).
Other issues that might prevent high throughput:
If you're requesting from many different hosts, you might choose to use a 3rd party dns library, because the .Net DNS lookup phase in web requests always runs synchronously. This sucks hard. Now you can use the IP address returned from the library in an HttpWebRequest, and manually set the Host
property to the name of the host you're trying to reach. I've found that this can make a very significant difference to the performance of your HTTP requests.
If you're making a lot of requests to the same hosts, you'll probably want to tweak ServicePointManager.DefaultConnectionLimit
so you can make more than 2 (or 6, depending on context) requests at once to a single host.
Upvotes: 2