Reputation: 2552
I'm writeing a batch program that scraps data from a website.
This is the code:
private async void buttonInfoJobs_Click(object sender, EventArgs e)
{
const string C_UrlTemplate= "https://www.mysite.it/{0}";
var _searches = new List<Get_SiteSearchResult>();
using (JobsDataContext db = new JobsDataContext())
{
_searches = db.Get_SiteSearch("JOBS").ToList();
foreach (var s in _searches)
{
WebBrowserJobs wb1 = new WebBrowserJobs();
Uri uri = new Uri(String.Format(C_UrlTemplate,s.SkillTech));
wb1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowserJobs_DocumentCompleted);
wb1.Navigating += new WebBrowserNavigatingEventHandler(webBrowserJobs_Navigating);
wb1.Url = uri;
}
}
}
The problem is that the website that I have to scrap is implementing Javascript in pages and to get the page, the webbrowser is reloaded different times.
This works perfectly if I ask just for an Url, the DocumentCompleted events is fired six times but finally I get the content required.
The problem comes when I have to ask for different urls in a loop: the website require captcha validation.
I can avoid this by implementing a delay of X seconds during the elaboration but I don't know how and were:
If I put System.Threading.Thread.Sleep(5000), all the execution is stoppend but I want to delay only the single task of the webbrowser...
How can I proceed ?
Upvotes: 0
Views: 42
Reputation: 1312
I've probably also answered your last question about the WebScraper, so I'm going to help you again ;)
You already have the async
keyword in the function definition, so you can just use the following code:
await Task.Delay(5000);
Upvotes: 1