fgblomqvist
fgblomqvist

Reputation: 2424

Go http.Get, concurrency, and "Connection reset by peer"

I have between 1000-2000 webpages to download from one server, and I am using go routines and channels to achieve a high efficiency. The problem is that every time I run my program up to 400 requests fail with the error "connection reset by peer". Rarely (maybe 1 out of 10 times), no requests fail.

What can I do to prevent this?

One thing that is interesting is that when I ran this program on a server in the same country as the server the website is hosted in, 0 requests failed, so I am guessing there is some problem with delay (as it is now running on a server on a different continent).

The code I am using is basically just a simple http.Get(url) request, no extra parameters or a custom client.

Upvotes: 29

Views: 55807

Answers (5)

xmh
xmh

Reputation: 135

In macOS, set the parameters.

sudo ulimit -n 6049
sudo sysctl -w kern.ipc.somaxconn=1024

https://github.com/golang/go/issues/20960#issuecomment-465998114

In Linux, set:

sudo ulimit -n 6049
sudo sysctl -w net.core.somaxconn=1024

Upvotes: 0

Mr_Pink
Mr_Pink

Reputation: 109347

The message connection reset by peer indicates that the remote server sent an RST to forcefully close the connection, either deliberately as a mechanism to limit connections, or as a result of a lack of resources. Either way you are likely opening too many connections, or reconnecting too fast.

Starting 1000-2000 connections in parallel is rarely the most efficient way to download that many pages, especially if most or all are coming from a single server. If you test the throughput you will find an optimal concurrency level that is far lower.

You will also want to set the Transport.MaxIdleConnsPerHost to match your level of concurrency. If MaxIdleConnsPerHost is lower than the expected number of concurrent connections, the server connections will often be closed after a request, only to be immediately opened again -- this will slow your progress significantly and possibly reach connection limits imposed by the server.

Upvotes: 45

JamesHalsall
JamesHalsall

Reputation: 13485

I had good results by setting the MaxConnsPerHost option on transport...

cl := &http.Client{
    Transport: &http.Transport{MaxConnsPerHost: 50}
}

MaxConnsPerHost optionally limits the total number of connections per host, including connections in the dialing, active, and idle states. On limit violation, dials will block.

https://golang.org/pkg/net/http/#Transport.MaxConnsPerHost

EDIT: To clarify, this option was released in Go 1.11 which was not available at the time of @AG1's or @JimB's answers above, hence me posting this.

Upvotes: 6

AG1
AG1

Reputation: 6774

Still a golang newbie, hopefully this helps.

var netClient = &http.Client{}

func init() {
    tr := &http.Transport{
        MaxIdleConns:       20,
        MaxIdleConnsPerHost:  20,
    }
    netClient = &http.Client{Transport: tr}
}

func foo() {
    resp, err := netClient.Get("http://www.example.com/")
}

Upvotes: 23

Paritosh Gupta
Paritosh Gupta

Reputation: 37

It might be possible that the server from which you are downloading the webpages has some type of throttling mechanism which prevents more than a certain number of requests per second/(or similar) from a certain ip?. Try limiting to maybe 100 requests per second or adding sleep between requests. Connection reset by peer is basically server denying you service. (What does "connection reset by peer" mean?)

Upvotes: 0

Related Questions