Max
Max

Reputation: 49

Multiple HttpClients with proxies, trying to achieve maximum download speed

I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.

public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
    this IEnumerable<Url> urls, FetchContext context)
{
    var fetchBlcock = new TransformBlock<Url, IFetchResult>(
        transform: url => url.FetchAsync(context), 
        dataflowBlockOptions: new ExecutionDataflowBlockOptions 
        {
            MaxDegreeOfParallelism = 128
        }
    );
    foreach(var url in urls)
        fetchBlcock.Post(url);

    fetchBlcock.Complete();
    var result = fetchBlcock.ToAsyncEnumerable();
    return result;
}

Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.

public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
    var httpClient = context.ProxyPool.Rent();
    var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
        context.isReloadWithCookie);
    context.ProxyPool.Return(httpClient);
    return result;
}

public HttpClient Rent() 
{
    lock(_lockObject)
    {
        if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
        {
            var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
            return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
        }
        return _proxiesQueue.Dequeue();
    }
}

I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?

Upvotes: 3

Views: 198

Answers (1)

Athanasios Kataras
Athanasios Kataras

Reputation: 26342

Let's take a look at what happens here:

var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);

You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs

The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.

In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.

Now to your problem:

  1. You can either use this excellent solution here: foreach async
  2. You can use the Parallel library to execute your code in different threads.

Something like the following from Parallel for example

Parallel.For(0, urls.Count,
         index => fetchBlcock.Post(urls[index])
});

Upvotes: 1

Related Questions