Chris
Chris

Reputation: 31

Suspend thread until WebBrowser has finished loading

I'm trying to navigate across a website and do some work on the pages programmatically using a WebBrowser control in a Windows Form. I found this while looking for a way to block my thread until the WebBrowser's DocumentCompleted event is triggered. Given that, here's my current code:

public partial class Form1 : Form
{
    private AutoResetEvent autoResetEvent;

    public Form1()
    {
        InitializeComponent();
    }

    private void button1_Click(object sender, EventArgs e)
    {
        Thread workerThread = new Thread(new ThreadStart(this.DoWork));
        workerThread.SetApartmentState(ApartmentState.STA);
        workerThread.Start();
    }

    private void DoWork()
    {
        WebBrowser browser = new WebBrowser();
        browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
        browser.Navigate(login_page);
        autoResetEvent.WaitOne();
        // log in

        browser.Navigate(page_to_process);
        autoResetEvent.WaitOne();
        // process the page
    }

    private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        autoResetEvent.Set();
    }
}

The thread doesn't look necessary, but it will be when I expand this code to accept requests over the network (the thread will listen for connections, then process the requests). Also, I can't just put the processing code inside the DocumentCompleted handler, since I have to navigate to several different pages and do different things on each one.

Now, from what I understand, the reason this doesn't work is because the DocumentCompleted event uses the same thread that WaitOne() is being called in, so the event will not be fired until WaitOne() returns (never, in this case).

What's interesting is that if I add a WebBrowser control to the form from the toolbox (drag-and-drop), then navigate using that, this code works perfectly (with no changes other than putting the call to Navigate inside a call to Invoke - see below). But if I manually add a WebBrowser control to the Designer file, it doesn't work. And I don't really want a visible WebBrowser on my form, I just want to report the results.

public delegate void NavigateDelegate(string address);
browser.Invoke(new NavigateDelegate(this.browser.Navigate), new string[] { login_page });

My question, then, is: What's the best way to suspend the thread until the browser's DocumentCompleted event fires?

Upvotes: 3

Views: 6880

Answers (3)

Pinoba
Pinoba

Reputation: 66

Chris,

I pass you here a possible implementation that solves the problem, but please give a look at the comments here under that I had to face and fix before everything worked as I was expecting. Here an example of a method doing some activities on a page in a webBrowser (note that the webBrowser is part of a Form in my case):

    internal ActionResponse CheckMessages() //Action Response is a custom class of mine to store some data coming from pages
        {
        //go to messages
        HtmlDocument doc = WbLink.Document; //wbLink is a referring link to a webBrowser istance
        HtmlElement ele = doc.GetElementById("message_alert_box");
        if (ele == null)
            return new ActionResponse(false);

        object obj = ele.DomElement;
        System.Reflection.MethodInfo mi = obj.GetType().GetMethod("click");
        mi.Invoke(obj, new object[0]);

        semaphoreForDocCompletedEvent = WaitForDocumentCompleted();  //This is a simil-waitOne statement (1)
        if (!semaphoreForDocCompletedEvent)
            throw new Exception("sequencing of Document Completed events is failed.");

        //get the list
        doc = WbLink.Document;
        ele = doc.GetElementById("mailz");
        if (!ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000)) //This is a simil-waitOne statement (2)

            ele = doc.GetElementById("mailz");
        ele = doc.GetElementById("mailz");

        //this contains a tbody
        HtmlElement tbody = ele.FirstChild;

        //count how many elemetns are espionage reports, these elements are inline then counting double with their wrappers on top of them.
        int spioCases = 0;
        foreach (HtmlElement trs in tbody.Children)
        {
            if (trs.GetAttribute("id").ToLower().Contains("spio"))
                spioCases++;
        }

        int nMessages = tbody.Children.Count - 2 - spioCases;

        //create an array of messages to store data
        GameMessage[] archive = new GameMessage[nMessages];

        for (int counterOfOpenMessages = 0; counterOfOpenMessages < nMessages; counterOfOpenMessages++)
        {

            //open first element
            WbLink.ScriptErrorsSuppressed = true;
            ele = doc.GetElementById("mailz");
            //this contains a tbody
            tbody = ele.FirstChild;

            HtmlElement mess1 = tbody.Children[1];
            int idMess1 = int.Parse(mess1.GetAttribute("id").Substring(0, mess1.GetAttribute("id").Length - 2));
            //check if subsequent element is not a spio report, in case it is then the element has not to be opened.
            HtmlElement mess1Sibling = mess1.NextSibling;
            if (mess1Sibling.GetAttribute("id").ToLower().Contains("spio"))
            {
                //this is a wrapper for spio report
                ReadSpioEntry(archive, counterOfOpenMessages, mess1, mess1Sibling);
                //delete first in line
                DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody);
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6); //This is a simil-waitOne statement (3)

            }
            else
            {
                //It' s anormal message
                OpenMessageEntry(ref obj, ref mi, tbody, idMess1); //This opens a modal dialog over the page, and it is not generating a DocumentCompleted Event in the webBrowser

                //actually opening a message generates 2 documetn completed events without any navigating event issued
                //Application.DoEvents();
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);

                //read element
                ReadMessageEntry(archive, counterOfOpenMessages);

                //close current message
                CloseMessageEntry(ref ele, ref obj, ref mi);  //this closes a modal dialog therefore is not generating a documentCompleted after!
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
                //delete first in line
                DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody); //this closes a modal dialog therefore is not generating a documentCompleted after!
                semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
            }
        }
        return new ActionResponse(true, archive);
    }

In practice this method takes a page of a MMORPG and reads messages sent to the account by other players and stores them in the ActionResponse class via the method ReadMessageEntry.

Apart the implementation and the logics of the code that are really case dependant (and not useful for you) there are few interesting elements that may be nice to note for you case. I put some comments in the code and highlighted 3 important points [with symbols (1), (2) and (3)]

The algo is:

1) Arrive to a page

2) get the underlying Document from the webBrowser

3) find a element to click to get to the messages page [done with : HtmlElement ele = doc.GetElementById("message_alert_box");]

4) Trigger the event of clicking on it via the MethodInfo instance and the reflection-wise call [ this calls another page so a DocumentCompleted will be arriving sooner or later]

5) Wait for the document completed to be called and then proceed [done with: semaphoreForDocCompletedEvent = WaitForDocumentCompleted(); at point (1)]

6) Fetch the new Document from the webBrowser after the page is changed

7) FInd a particular anchor on the page that is defining where the message I want to read are

8) Be sure that such TAG is present in the page (as there might be some AJAX delaying what I want to read to be ready) [done with: ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000) that is point (2)]

9) Do the whole loop for reading each message, which implies to open a modal dialog form that is on the same page therefore not generating a DocumentCompleted, read it when ready, then close it, and reloop. For this particular case I use an overload of (1) called semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6); at point (3)

Now the three methods I use to pause, check and read:

(1) To stop while DocumentCompleted is raised without overcharging DocumentCompleted method that may be used for more than one single purpose (as in your case)

private bool WaitForDocumentCompleted()
        {
            Thread.SpinWait(1000);  //This is dirty but working
            while (Program.BrowsingSystem.IsBusy) //BrowsingSystem is another link to Browser that is made public in my Form and IsBusy is just a bool put to TRUE when Navigating event is raised and but to False when the DocumentCOmpleted is fired.
            {
                Application.DoEvents();
                Thread.SpinWait(1000);
            }

            if (Program.BrowsingSystem.IsInfoAvailable)  //IsInfoAvailable is just a get property to cover webBroweser.Document inside a lock statement to protect from concurrent accesses.
            {
                return true;
            }
            else
                return false;
        }

(2) Wait for a particular tag to be available in the page:

public static bool WaitForAvailability(this HtmlElement tag, string id, HtmlDocument documentToExtractFrom, long maxCycles)
        {
            bool cond = true;
            long counter = 0;
            while (cond)
            {
                Application.DoEvents(); //VERIFY trovare un modo per rimuovere questa porcheria
                tag = documentToExtractFrom.GetElementById(id);
                if (tag != null)
                    cond = false;
                Thread.Yield();
                Thread.SpinWait(100000);
                counter++;
                if (counter > maxCycles)
                    return false;
            }
            return true;
        }

(3) The dirty trick to wait for a DocumentCompleted that will ever arrive because no frames need reload on the page!

private bool WaitForDocumentCompleted(int seconds)
    {
        int counter = 0;
        while (Program.BrowsingSystem.IsBusy)
        {
            Application.DoEvents();
            Thread.Sleep(1000);
            if (counter == seconds)
            {
            return true;
            }
            counter++;
        }
        return true;
    }

I pass you also the DocumentCompleted Methods and Navigating to give you the whole picture on how I used them.

private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            if (Program.BrowsingSystem.BrowserLink.ReadyState == WebBrowserReadyState.Complete)
            {
                lock (Program.BrowsingSystem.BrowserLocker)
                {
                    Program.BrowsingSystem.ActualPosition = Program.BrowsingSystem.UpdatePosition(Program.BrowsingSystem.Document);
                    Program.BrowsingSystem.CheckContentAvailability();
                    Program.BrowsingSystem.IsBusy = false;
                }
            }
        }

private void webBrowser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
        {
            lock (Program.BrowsingSystem.BrowserLocker)
            {
                Program.BrowsingSystem.ActualPosition.PageName = OgamePages.OnChange;
                Program.BrowsingSystem.IsBusy = true;
            }
        }

Please give a look here to know the mess behind DoEvents() if you're now aware about the details that lie behind the implementation presented here (hope it is not a problem to link other sites from S.Overflow).

A small final note on the fact that you need to put the call to your Navigate method inside an Invoke when you use it from a Form instance: this is crystal clear you need an Invoke because the methods that need to work on the webBrowser (or even having it into scope as a refereed variable) need to be launched on the same Thread of the webBrowser itself!

Moreover if the WB is a child of some kind of Form container, it also needs that the thread from where it is instantiated is the same of the Form creation, and for transitivity all the methods that need to work on the WB need to be called on the Form thread (in you case the invoke relocates your calls on the Form native thread). I hope this is useful for you (I just left a //VERIFY comment in the code in my native language to let you know what I think about Application.DoEvents()).

Kind regards, Alex

Upvotes: 1

Ebad Masood
Ebad Masood

Reputation: 2379

EDIT

register in Initialize component method like this, instead of in the same method.

WebBrowser browser = new WebBrowser(); 
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);

ReadyState will tell you the progress of the document loading when checked in the DocumentCompleted event.

void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
   if (browser.ReadyState == WebBrowserReadyState.Complete)
{

}
}

Upvotes: 0

Serguei Fedorov
Serguei Fedorov

Reputation: 7923

HAH! I had the same question. You can do this with event handling. If you stop a thread mid way through the page, it will need to wait until it the page finishes. You can easily do this by attaching

 Page.LoadComplete += new EventHandler(triggerFunction);

In the triggerFunction you can do this

triggerFunction(object sender, EventArgs e)
{
     autoResetEvent.reset();
}

Let me know if this works. I ended up not using threads in mine and instead just putting the stuff into triggerFunction. Some syntax might not be 100% correct because I am answering off the top of my head

Upvotes: 0

Related Questions