DarioN1
DarioN1

Reputation: 2552

Sleep the WebBrowser instance and not entire program while Scraping

I'm writeing a batch program that scraps data from a website.

This is the code:

private async void buttonInfoJobs_Click(object sender, EventArgs e)
{
    const string C_UrlTemplate= "https://www.mysite.it/{0}";

    var _searches = new List<Get_SiteSearchResult>();
    using (JobsDataContext db = new JobsDataContext())
    {
        _searches = db.Get_SiteSearch("JOBS").ToList();
        foreach (var s in _searches)
        {
            WebBrowserJobs wb1 = new WebBrowserJobs();
            Uri uri = new Uri(String.Format(C_UrlTemplate,s.SkillTech));

            wb1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserJobs_DocumentCompleted);
            wb1.Navigating += new WebBrowserNavigatingEventHandler(webBrowserJobs_Navigating);

            wb1.Url = uri;

        }
    }
}

The problem is that the website that I have to scrap is implementing Javascript in pages and to get the page, the webbrowser is reloaded different times.

This works perfectly if I ask just for an Url, the DocumentCompleted events is fired six times but finally I get the content required.

The problem comes when I have to ask for different urls in a loop: the website require captcha validation.

I can avoid this by implementing a delay of X seconds during the elaboration but I don't know how and were:

If I put System.Threading.Thread.Sleep(5000), all the execution is stoppend but I want to delay only the single task of the webbrowser...

How can I proceed ?

Upvotes: 0

Views: 42

Answers (1)

Simon
Simon

Reputation: 1312

I've probably also answered your last question about the WebScraper, so I'm going to help you again ;)

You already have the async keyword in the function definition, so you can just use the following code:

await Task.Delay(5000);

Upvotes: 1

Related Questions