Reputation: 1095
I have a list of URLs that I need to extract their HTML for each one separately. The URLs:
foo_list = {"expamle.com", "example.net", "example.org"};
The code I tried:
foreach (string x in foo_list) {
webBrowser1.Navigate(x);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string html = webBrowser.Document.Body.Parent.OuterHtml;
// handle the html and save to file...
}
The problem is I just got the html and data of the last URL (example.org
) in the list. I understand that the Navigate
command in the foreach loop run too fast so just the last URL can wait for DocumentCompleted
. So, how can I handle this problem?
Upvotes: 1
Views: 1340
Reputation: 2854
You can handle it by keeping an index, wait until document's download progress is complete, and then move to the next one:
int index = -1; //variable in class
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string html = webBrowser.Document.Body.Parent.OuterHtml;
if (index + 1 != foo_list) //So it will stop when there's no links left.
webBrowser1.Navigate(foo_list[++index]);
}
But to trigger the rotation of the URLs you will need to navigate to the first URL in the list. To do this you can just execute this somewhere else to trigger it:
if (index + 1 != foo_list.Count)
webBrowser1.Navigate(foo_list[++index]);
But instead I want to suggest an alternative: WebClient.DownloadString(System.String), which you can directly download the html so you can choose your way and iterate over while downloading.
Upvotes: 2