Bob Mc
Bob Mc

Reputation: 2008

HttpWebRequest Timeouts After Ten Consecutive Requests

I'm writing a web crawler for a specific site. The application is a VB.Net Windows Forms application that is not using multiple threads - each web request is consecutive. However, after ten successful page retrievals every successive request times out.

I have reviewed the similar questions already posted here on SO, and have implemented the recommended techniques into my GetPage routine, shown below:

Public Function GetPage(ByVal url As String) As String
    Dim result As String = String.Empty

    Dim uri As New Uri(url)
    Dim sp As ServicePoint = ServicePointManager.FindServicePoint(uri)
    sp.ConnectionLimit = 100

    Dim request As HttpWebRequest = WebRequest.Create(uri)
    request.KeepAlive = False
    request.Timeout = 15000

    Try
        Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
            Using dataStream As Stream = response.GetResponseStream()
                Using reader As New StreamReader(dataStream)
                    If response.StatusCode <> HttpStatusCode.OK Then
                        Throw New Exception("Got response status code: " + response.StatusCode)
                    End If
                    result = reader.ReadToEnd()
                End Using
            End Using
            response.Close()
        End Using

    Catch ex As Exception
        Dim msg As String = "Error reading page """ & url & """. " & ex.Message
        Logger.LogMessage(msg, LogOutputLevel.Diagnostics)
    End Try

    Return result

End Function

Have I missed something? Am I not closing or disposing of an object that should be? It seems strange that it always happens after ten consecutive requests.

Notes:

  1. In the constructor for the class in which this method resides I have the following:

    ServicePointManager.DefaultConnectionLimit = 100

  2. If I set KeepAlive to true, the timeouts begin after five requests.

  3. All the requests are for pages in the same domain.

EDIT

I added a delay between each web request of between two and seven seconds so that I do not appear to be "hammering" the site or attempting a DOS attack. However, the problem still occurs.

Upvotes: 7

Views: 8199

Answers (6)

Ldaniel
Ldaniel

Reputation: 11

If the server is using a database and does not close each database connection properly, you may receive an error (e.g. statuscode 502) when the max. connection limited is reached (until the database connection timeout). A solution in this case is only to 'sleep' the webrequest thread for a given time. Furthermore you should ensure that each request and reponse stream is being closed after processing (in best case by using of an 'Using' statement):

Upvotes: 1

blue18hutthutt
blue18hutthutt

Reputation: 3243

I know this is an old question, but I recently had this problem myself (due to my target environment using 4.0 and not allowing any external assembly references)

I did some digging however and found a fix of sorts and is very interesting from a .NET inner-workings perspective

ServicePointManager.DefaultConnectionLimit = 100;

ServicePointManager internally handles the actual HTTP request created by multiple HttpWebRequest objects ..problem is, these don’t get closed automatically and HttpWebRequest doesn’t gets garbage collected immediately

So I found something very interesting – if I make HttpWebRequest an instance level variable AND I force garbage collection after switching the reference out …it works (without the DefaultConnectionLimit = 100 hack)

private HttpWebRequest Request { get; set; }

public void MyMethod() {
   Request = (HttpWebRequest)HttpWebRequest.Create("http://myUrl");
   GC.Collect();
   GC.WaitForFullGCComplete();
}

Before I was creating a new local variable each time in the method. This seemed to fix my problem - probably a little too late to help you but thought I'd share in case anyone else comes across this

Upvotes: 1

fernando roque
fernando roque

Reputation: 21

I used the following solution and it works for me. Hope it helps to you too.

Declare "global" on the form the variables.

HttpWebRequest myHttpWebRequest;
HttpWebResponse myHttpWebResponse;

Then always use myHttpWebResponse.Close(); after each connection.

myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
myHttpWebResponse.Close();

Upvotes: 2

tmaj
tmaj

Reputation: 35135

myRequest.Connection = "Close"; will make the server close the connection which will make the connection manager close the connection too.

Upvotes: 0

Geoff
Geoff

Reputation: 3769

I ran into this issue today and my resolution was to ensure that the response was closed at all times.

I think that you need to put in a response.Close() before you throw your exception inside the using.

Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse) 
        Using dataStream As Stream = response.GetResponseStream() 
            Using reader As New StreamReader(dataStream) 
                If response.StatusCode <> HttpStatusCode.OK Then 
                    response.Close()  
                    Throw New Exception("Got response status code: " + response.StatusCode) 
                End If 
                result = reader.ReadToEnd() 
            End Using 
        End Using 
        response.Close() 
    End Using

Upvotes: 4

Paul van Brenk
Paul van Brenk

Reputation: 7559

I think the site has some sort of DOS protection, which kicks in when it's hit with a number of rapis requests. You may want to try setting the UserAgent on the webrequest.

Upvotes: 3

Related Questions