Alejandro
Alejandro

Reputation: 308

HttpClient ReadAsStringAsync with progress

Is there a way to get the progress of the ReadAsStringAsync() method? I am just getting the HTML content of a website and parsing.

public static async Task<returnType> GetStartup(string url = "http://")
{
    using (HttpClient client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("User-Agent",
            "Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko");
        using (HttpResponseMessage response = await client.GetAsync(url))
        {
            using (HttpContent content = response.Content)
            {
                string result = await content.ReadAsStringAsync();
            }
        }
    }
}

Upvotes: 4

Views: 2661

Answers (1)

Dai
Dai

Reputation: 155055

Is there a way to get the progress of the ReadAsStringAsync() method? I am just getting the html content of a website and parsing.

Yes and no.

HttpClient does not expose timing and progress information from the underlying network-stack, but you can get some information out by using HttpCompletionOption.ResponseHeadersRead, the Content-Length header, and reading the response yourself with your own StreamReader (asynchronously, of course).

Do note that the Content-Length in the response headers will refer to the length of the compressed content prior to decompression, not the original content length, which complicates things because probably most web-servers today will serve HTML (and static content) with gzip compression (as either Content-Encoding or Transfer-Encoding), so the Content-Length header will not tell you the length of the decompressed content. Unfortunately, while HttpClient can do automatic GZip decompression for you, it won't tell you what the decompressed content length is.

But you can still report some kinds of progress back to your method's consumer, see below for an example. You should do this using the .NET idiomatic IProgress<T> interface rather than rolling your own.

Like so:

private static readonly HttpClient _hc = new HttpClient()
{
    DefaultRequestHeaders =
    {
        { "User-Agent", "Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko" }
    }
    // NOTE: Automatic Decompression is not enabled in this HttpClient so that Content-Length can be safely used. But this will drastically slow down content downloads.
};

public static async Task<T> GetStartupAsync( IProgress<String> progress, string url = "http://")
{
    progress.Report( "Now making HTTP request..." );

    using( HttpResponseMessage response = await client.GetAsync( url, HttpCompletionOption.ResponseHeadersRead ) )
    {
        progress.Report( "Received HTTP response. Now reading response content..." );

        Int64? responseLength = response.Content.Headers.ContentLength;
        if( responseLength.HasValue )
        {
            using( Stream responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false) )
            using( StreamReader rdr = new StreamReader( responseStream ) )
            {
                Int64 totalBytesRead = 0;
                StringBuilder sb = new StringBuilder( capacity: responseLength.Value ); // Note that `capacity` is in 16-bit UTF-16 chars, but responseLength is in bytes, though assuming UTF-8 it evens-out.

                Char[] charBuffer = new Char[4096];
                while( true )
                {
                    Int32 read = await rdr.ReadAsync( charBuffer ).ConfigureAwait(false);
                    sb.Append( charBuffer, 0, read );

                    if( read === 0 )
                    {
                        // Reached end.
                        progress.Report( "Finished reading response content." );
                        break;
                    }
                    else
                    {
                        progress.Report( String.Format( CultureInfo.CurrentCulture, "Read {0:N0} / {1:N0} chars (or bytes).", sb.Length, resposneLength.Value );
                    }
                }
            }
        }
        else
        {
            progress.Report( "No Content-Length header in response. Will read response until EOF." );
            
            string result = await content.ReadAsStringAsync();
        }
       
        progress.Report( "Finished reading response content." );
    }

Notes:

  • In general, any async method or method returning a Task/Task<T> should be named with an Async suffix, so your method should be named GetStartupAsync, not GetStartup.
  • Unless you have an IHttpClientFactory available, you should not wrap a HttpClient in a using block because this can cause system resource exhaustion, especially in server application.
    • (The reasons for this are complicated and also may differ depending on your .NET implementation (e.g. I believe Xamarin's HttpClient doesn't have this problem), but I won't go into details here).
    • So you can safely ignore any Code Analysis warning about not disposing of your HttpClient. This is one of the few exceptions to the rule about always disposing of any IDisposable objects that you create or own.
    • As HttpClient is thread-safe and this is a static method consider using a cached static instance instead.
  • You also don't need to wrap HttpResponseMessage.Content in a using block either, as the Content object is owned by the HttpResponseMessage.

Upvotes: 6

Related Questions