Asım Gündüz
Asım Gündüz

Reputation: 1297

How to send multiple web request and process them as fast as possible

I'm working on a C# winforms app and I have around 84 urls that I want to parse them using html agility pack

for 84 records it takes 150 seconds to complete the job with below code.

I was wondering what options do I have to make it run faster? any help is much appreciated!

Following is my code structure to do the job

public class URL_DATA
{
    public string URL { get; set; }
    public HtmlDocument doc  { get; set; }
}

then I call the below function to do the job

 public async Task ProcessUrls(string cookie)
 {
                var tsk = new List<Task>();
                //UrlsToProcess is List<URL_DATA>
                UrlsToProcess.ForEach(async data =>
                {
                    tsk.Add(Task.Run(async () => 
                    {
                      var htmToParse =  await ScrapUtils.GetAgilityDocby(cookie, data.URL);

                        var htmlDoc = new HtmlDocument();
                        htmlDoc.LoadHtml(htmToParse);
                        data.doc = htmlDoc;

                    }));

                });
                await Task.WhenAll(tsk).ConfigureAwait(false);    
   }

and finally below is the method I use to get request string.

 public static async Task<string> GetAgilityDocby(string cookie, string url)
        {
            using (var wc = new WebClient())
            {
                wc.Proxy = null;// WebRequest.DefaultWebProxy;// GlobalProxySelection.GetEmptyWebProxy();
                wc.Headers.Add(HttpRequestHeader.Cookie, cookie);

                wc.Headers.Add(HttpRequestHeader.UserAgent,
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36");
                wc.Encoding = Encoding.UTF8;
                test++;
                return await  wc.DownloadStringTaskAsync(url).ConfigureAwait(false);
            }
        }

Upvotes: 0

Views: 432

Answers (2)

FastJack
FastJack

Reputation: 896

Try increasing the minimum running Thread number by

ThreadPool.SetMinThreads(84,84);

This should speed things up alot.

As for the Task-Creation pointed out by Ilya, i would recomment you omit the Task.Run / AwaitAll part completely and use the Parallel mechanism, which was developed for exactly this kind of problem:

Parallel.ForEach(UrlsToProcess, data =>
{
    var htmToParse =  ScrapUtils.GetAgilityDocby(cookie, data.URL);

    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(htmToParse);
    data.doc = htmlDoc;
});

Upvotes: 0

Ilya Chernomordik
Ilya Chernomordik

Reputation: 30205

You are using a ForEach with asynchronous lambda. I have a suspicion that it makes your code run sequentially instead of parallel since each next iteration will do await.

So what you can do to figure that out for sure:

  1. Check the maximum time of the operation for one URL, that time should be around how fast the whole thing should go (if you have enough bandwidth, memory and CPU).
  2. Verify that you operations are indeed running in parallel. E.g. by outputting a counter to console. It should not be sequential and look random enough

You can change your task creation code to this e.g. to try:

var allTasks = myUrls.Select(url => Task.Run(() => {yourCode})
Task.WhenAll(allTasks);

Upvotes: 1

Related Questions