TaylorM
TaylorM

Reputation: 109

Trouble with WebClient and Thread management

I've been scratching my hair out trying to figure out this problem. I'm using a WebClient control that reads in a dynamic URL. There is data that I am trying to extract that isn't in the HTML source when retrieved from the Server but is rendered later with Javascript/AJAX.

I've used multiple methods including Thread.Join() and BackgroundWorker with zero results.

I'm now trying to use a async method but to be honest I'm totally lost as to what I'm doing.

Here is my code:

protected void retrieveDataSource(int matchId_val)
{
ManualResetEvent completionEvent = new ManualResetEvent(false);
WebClient wc = new WebClient();
wc.DownloadStringCompleted += delegate(object sender, DownloadStringCompletedEventArgs e)
{
source = e.Result;
completionEvent.Set();
};
wc.DownloadStringAsync(new Uri("http://na.lolesports.com/tourney/match/" + matchId_val));
}

protected void LoadWebPage()
{
retrieveDataSource(matchId_val);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(source);
}

source is a global variable that is set to null initially. When I run this code the DownloadStringCompleted argument is never triggered and thus the source is never changed from null. When it reaches doc.LoadHtml(source) I'm given a null exception. IT SHOULD BE NOTED that if I hit 'Continue' then the breakpoint will arrive at the DownloadStringCompleted function which is beyond me.

If anyone can help me I'd greatly appreciate it as I've already spent my entire morning trying to wrap my mind around the issue.

Upvotes: 0

Views: 105

Answers (1)

Peter Duniho
Peter Duniho

Reputation: 70701

There are ultimately a number of difficulties you may well run into as you attempt this. The bottom line is that to get at dynamically-generated content, you have to render the page, which is a lot different operation from simply downloading what the HTTP server gives you for a given URL.

In addition, it's not clear what you're using to render the web page. You are using a class named HtmlDocument and a method named LoadHtml(). This suggests that you are using Html Agility Pack, but your question is silent on that point. To my recollection, that library doesn't render HTML; but I could be wrong or have out of date information.

All that said, there is a very clear bug in your code. You create an event handle, which is apparently used to signal the completion of the asynchronous operation, but you never wait on it. This means that the thread that started the I/O will just keep going and attempt to retrieve the result before it is actually available.

One way to address that would be to wait on the event handle:

protected void retrieveDataSource(int matchId_val)
{
    ManualResetEvent completionEvent = new ManualResetEvent(false);
    WebClient wc = new WebClient();
    wc.DownloadStringCompleted += delegate(object sender, DownloadStringCompletedEventArgs e)
    {
        source = e.Result;
        completionEvent.Set();
    };
    wc.DownloadStringAsync(new Uri("http://na.lolesports.com/tourney/match/" + matchId_val));
    completionEvent.WaitOne();
}

Of course, if you're just going to make the thread block while you wait for the operation to complete, that raises the question of why are you using asynchronous I/O at all? Why not just call DownloadString() instead, which will automatically block until the operation is done.

I also advise against the use of a class field for the purpose of passing data from a called method to the caller. It would make more sense here for retrieveDataSource() to return the result to the caller directly. Were the code written in that way, the issue with the thread synchronization would have been more readily apparent, as you likely would have noticed the method returning before that value was actually available.

But if you insist on using the asynchronous method, the above change should at least resolve your thread synchronization issue.

Upvotes: 2

Related Questions