user2278943
user2278943

Reputation:

htmlagilitypack c# error 403 forbidden

I use htmlagilitypack to get information from here. Here's the code

int i=2449520;

.....................

web.OverrideEncoding = Encoding.UTF8;
web.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
doc = web.Load("http://ru-patent.info/24/49/" + i + ".html");
var List = doc.DocumentNode.SelectNodes("//div[@style='padding:10px; border:#999 dotted 1px; background-color:#FFF; background-image:url(/imgs/back.gif);']");
foreach (var t in List)
{
    Regex regex = new Regex(@"\sRU\s\d+");
    Match match = regex.Match(t.InnerText);
    sw.WriteLine(i.ToString());
    while (match.Success)
    {
       sw.WriteLine(match.ToString());
       match = match.NextMatch();
    }
    sw.WriteLine('\n');
}
i++;

I also use a timer with interval of 10 seconds and there are more than thousand of pages that I need to get information from. But after about 30 pages I get the 403 forbidden error. How can I bypass this?

Upvotes: 1

Views: 1110

Answers (1)

outcoldman
outcoldman

Reputation: 11832

Response 403 means that server refuses to accept your request. I guess this can be a server protection from DDoS. You can use different servers (with different API address) or try to take break between requests. Also it is always good to ask site owners what is the best way to parse their information.

Upvotes: 1

Related Questions