Reputation: 31
I'm trying to navigate across a website and do some work on the pages programmatically using a WebBrowser control in a Windows Form. I found this while looking for a way to block my thread until the WebBrowser's DocumentCompleted event is triggered. Given that, here's my current code:
public partial class Form1 : Form
{
private AutoResetEvent autoResetEvent;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
Thread workerThread = new Thread(new ThreadStart(this.DoWork));
workerThread.SetApartmentState(ApartmentState.STA);
workerThread.Start();
}
private void DoWork()
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(browser_DocumentCompleted);
browser.Navigate(login_page);
autoResetEvent.WaitOne();
// log in
browser.Navigate(page_to_process);
autoResetEvent.WaitOne();
// process the page
}
private void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
autoResetEvent.Set();
}
}
The thread doesn't look necessary, but it will be when I expand this code to accept requests over the network (the thread will listen for connections, then process the requests). Also, I can't just put the processing code inside the DocumentCompleted handler, since I have to navigate to several different pages and do different things on each one.
Now, from what I understand, the reason this doesn't work is because the DocumentCompleted event uses the same thread that WaitOne() is being called in, so the event will not be fired until WaitOne() returns (never, in this case).
What's interesting is that if I add a WebBrowser control to the form from the toolbox (drag-and-drop), then navigate using that, this code works perfectly (with no changes other than putting the call to Navigate inside a call to Invoke - see below). But if I manually add a WebBrowser control to the Designer file, it doesn't work. And I don't really want a visible WebBrowser on my form, I just want to report the results.
public delegate void NavigateDelegate(string address);
browser.Invoke(new NavigateDelegate(this.browser.Navigate), new string[] { login_page });
My question, then, is: What's the best way to suspend the thread until the browser's DocumentCompleted event fires?
Upvotes: 3
Views: 6880
Reputation: 66
Chris,
I pass you here a possible implementation that solves the problem, but please give a look at the comments here under that I had to face and fix before everything worked as I was expecting. Here an example of a method doing some activities on a page in a webBrowser (note that the webBrowser is part of a Form in my case):
internal ActionResponse CheckMessages() //Action Response is a custom class of mine to store some data coming from pages
{
//go to messages
HtmlDocument doc = WbLink.Document; //wbLink is a referring link to a webBrowser istance
HtmlElement ele = doc.GetElementById("message_alert_box");
if (ele == null)
return new ActionResponse(false);
object obj = ele.DomElement;
System.Reflection.MethodInfo mi = obj.GetType().GetMethod("click");
mi.Invoke(obj, new object[0]);
semaphoreForDocCompletedEvent = WaitForDocumentCompleted(); //This is a simil-waitOne statement (1)
if (!semaphoreForDocCompletedEvent)
throw new Exception("sequencing of Document Completed events is failed.");
//get the list
doc = WbLink.Document;
ele = doc.GetElementById("mailz");
if (!ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000)) //This is a simil-waitOne statement (2)
ele = doc.GetElementById("mailz");
ele = doc.GetElementById("mailz");
//this contains a tbody
HtmlElement tbody = ele.FirstChild;
//count how many elemetns are espionage reports, these elements are inline then counting double with their wrappers on top of them.
int spioCases = 0;
foreach (HtmlElement trs in tbody.Children)
{
if (trs.GetAttribute("id").ToLower().Contains("spio"))
spioCases++;
}
int nMessages = tbody.Children.Count - 2 - spioCases;
//create an array of messages to store data
GameMessage[] archive = new GameMessage[nMessages];
for (int counterOfOpenMessages = 0; counterOfOpenMessages < nMessages; counterOfOpenMessages++)
{
//open first element
WbLink.ScriptErrorsSuppressed = true;
ele = doc.GetElementById("mailz");
//this contains a tbody
tbody = ele.FirstChild;
HtmlElement mess1 = tbody.Children[1];
int idMess1 = int.Parse(mess1.GetAttribute("id").Substring(0, mess1.GetAttribute("id").Length - 2));
//check if subsequent element is not a spio report, in case it is then the element has not to be opened.
HtmlElement mess1Sibling = mess1.NextSibling;
if (mess1Sibling.GetAttribute("id").ToLower().Contains("spio"))
{
//this is a wrapper for spio report
ReadSpioEntry(archive, counterOfOpenMessages, mess1, mess1Sibling);
//delete first in line
DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody);
semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6); //This is a simil-waitOne statement (3)
}
else
{
//It' s anormal message
OpenMessageEntry(ref obj, ref mi, tbody, idMess1); //This opens a modal dialog over the page, and it is not generating a DocumentCompleted Event in the webBrowser
//actually opening a message generates 2 documetn completed events without any navigating event issued
//Application.DoEvents();
semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
//read element
ReadMessageEntry(archive, counterOfOpenMessages);
//close current message
CloseMessageEntry(ref ele, ref obj, ref mi); //this closes a modal dialog therefore is not generating a documentCompleted after!
semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
//delete first in line
DeleteFirstMessageItem(doc, ref ele, ref obj, ref mi, ref tbody); //this closes a modal dialog therefore is not generating a documentCompleted after!
semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
}
}
return new ActionResponse(true, archive);
}
In practice this method takes a page of a MMORPG and reads messages sent to the account by other players and stores them in the ActionResponse class via the method ReadMessageEntry.
Apart the implementation and the logics of the code that are really case dependant (and not useful for you) there are few interesting elements that may be nice to note for you case.
I put some comments in the code and highlighted 3 important points [with symbols (1)
, (2)
and (3)
]
The algo is:
1) Arrive to a page
2) get the underlying Document from the webBrowser
3) find a element to click to get to the messages page [done with : HtmlElement ele = doc.GetElementById("message_alert_box");
]
4) Trigger the event of clicking on it via the MethodInfo instance and the reflection-wise call [ this calls another page so a DocumentCompleted will be arriving sooner or later]
5) Wait for the document completed to be called and then proceed [done with: semaphoreForDocCompletedEvent = WaitForDocumentCompleted();
at point (1)]
6) Fetch the new Document from the webBrowser after the page is changed
7) FInd a particular anchor on the page that is defining where the message I want to read are
8) Be sure that such TAG is present in the page (as there might be some AJAX delaying what I want to read to be ready) [done with: ele.WaitForAvailability("mailz", Program.BrowsingSystem.Document, 10000)
that is point (2)]
9) Do the whole loop for reading each message, which implies to open a modal dialog form that is on the same page therefore not generating a DocumentCompleted, read it when ready, then close it, and reloop. For this particular case I use an overload of (1) called semaphoreForDocCompletedEvent = WaitForDocumentCompleted(6);
at point (3)
Now the three methods I use to pause, check and read:
(1) To stop while DocumentCompleted is raised without overcharging DocumentCompleted method that may be used for more than one single purpose (as in your case)
private bool WaitForDocumentCompleted()
{
Thread.SpinWait(1000); //This is dirty but working
while (Program.BrowsingSystem.IsBusy) //BrowsingSystem is another link to Browser that is made public in my Form and IsBusy is just a bool put to TRUE when Navigating event is raised and but to False when the DocumentCOmpleted is fired.
{
Application.DoEvents();
Thread.SpinWait(1000);
}
if (Program.BrowsingSystem.IsInfoAvailable) //IsInfoAvailable is just a get property to cover webBroweser.Document inside a lock statement to protect from concurrent accesses.
{
return true;
}
else
return false;
}
(2) Wait for a particular tag to be available in the page:
public static bool WaitForAvailability(this HtmlElement tag, string id, HtmlDocument documentToExtractFrom, long maxCycles)
{
bool cond = true;
long counter = 0;
while (cond)
{
Application.DoEvents(); //VERIFY trovare un modo per rimuovere questa porcheria
tag = documentToExtractFrom.GetElementById(id);
if (tag != null)
cond = false;
Thread.Yield();
Thread.SpinWait(100000);
counter++;
if (counter > maxCycles)
return false;
}
return true;
}
(3) The dirty trick to wait for a DocumentCompleted that will ever arrive because no frames need reload on the page!
private bool WaitForDocumentCompleted(int seconds)
{
int counter = 0;
while (Program.BrowsingSystem.IsBusy)
{
Application.DoEvents();
Thread.Sleep(1000);
if (counter == seconds)
{
return true;
}
counter++;
}
return true;
}
I pass you also the DocumentCompleted Methods and Navigating to give you the whole picture on how I used them.
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (Program.BrowsingSystem.BrowserLink.ReadyState == WebBrowserReadyState.Complete)
{
lock (Program.BrowsingSystem.BrowserLocker)
{
Program.BrowsingSystem.ActualPosition = Program.BrowsingSystem.UpdatePosition(Program.BrowsingSystem.Document);
Program.BrowsingSystem.CheckContentAvailability();
Program.BrowsingSystem.IsBusy = false;
}
}
}
private void webBrowser_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
lock (Program.BrowsingSystem.BrowserLocker)
{
Program.BrowsingSystem.ActualPosition.PageName = OgamePages.OnChange;
Program.BrowsingSystem.IsBusy = true;
}
}
Please give a look here to know the mess behind DoEvents() if you're now aware about the details that lie behind the implementation presented here (hope it is not a problem to link other sites from S.Overflow).
A small final note on the fact that you need to put the call to your Navigate method inside an Invoke when you use it from a Form instance: this is crystal clear you need an Invoke because the methods that need to work on the webBrowser (or even having it into scope as a refereed variable) need to be launched on the same Thread of the webBrowser itself!
Moreover if the WB is a child of some kind of Form container, it also needs that the thread from where it is instantiated is the same of the Form creation, and for transitivity all the methods that need to work on the WB need to be called on the Form thread (in you case the invoke relocates your calls on the Form native thread). I hope this is useful for you (I just left a //VERIFY comment in the code in my native language to let you know what I think about Application.DoEvents()).
Kind regards, Alex
Upvotes: 1
Reputation: 2379
EDIT
register in Initialize component method like this, instead of in the same method.
WebBrowser browser = new WebBrowser();
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);
ReadyState will tell you the progress of the document loading when checked in the DocumentCompleted event.
void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (browser.ReadyState == WebBrowserReadyState.Complete)
{
}
}
Upvotes: 0
Reputation: 7923
HAH! I had the same question. You can do this with event handling. If you stop a thread mid way through the page, it will need to wait until it the page finishes. You can easily do this by attaching
Page.LoadComplete += new EventHandler(triggerFunction);
In the triggerFunction you can do this
triggerFunction(object sender, EventArgs e)
{
autoResetEvent.reset();
}
Let me know if this works. I ended up not using threads in mine and instead just putting the stuff into triggerFunction. Some syntax might not be 100% correct because I am answering off the top of my head
Upvotes: 0