JuanIsFree
JuanIsFree

Reputation: 370

Kubernetes on AWS Master node problems

After running Kubernetes on AWS for a few days, my master node goes dead. This has happened after setting up 2 different clusters. The pods are still running and available, but there's no way to manage / proxy.

Question is why? Or alternatively, how do I replace the master node on AWS? Or alternatively, how do I debug the existing one? Or alternatively, how do I use something other than a t2.micro, which may be too small to run master?

Symptom: $ kubectl get pods error: couldn't read version from server: Get https://**.###.###.###/api: dial tcp **.###.###.###:443: connection refused

Edit: This is what I found after further debugging:

goroutine 571 [running]:
net/http.func·018()
    /usr/src/go/src/net/http/transport.go:517 +0x2a
net/http.(*Transport).CancelRequest(0xc2083c0630, 0xc209750d00)
    /usr/src/go/src/net/http/transport.go:284 +0x97
github.com/coreos/go-etcd/etcd.func·003()
    /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/coreos/go-etcd/etcd/requests.go:159 +0x236
created by github.com/coreos/go-etcd/etcd.(*Client).SendRequest
    /go/src/github.com/GoogleCloudPlatform/kubernetes/Godeps/_workspace/src/github.com/coreos/go-etcd/etcd/requests.go:168 +0x3e3

goroutine 1 [IO wait, 12 minutes]:
net.(*pollDesc).Wait(0xc20870e760, 0x72, 0x0, 0x0)
    /usr/src/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc20870e760, 0x0, 0x0)
    /usr/src/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc20870e700, 0x0, 0x7f4424a42008, 0xc20930a168)
    /usr/src/go/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc20804bec0, 0x5bccce, 0x0, 0x0)
    /usr/src/go/src/net/tcpsock_posix.go:234 +0x4e
net/http.tcpKeepAliveListener.Accept(0xc20804bec0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/server.go:1976 +0x4c
net/http.(*Server).Serve(0xc20887ec60, 0x7f4424a66dc8, 0xc20804bec0, 0x0, 0x0)
    /usr/src/go/src/net/http/server.go:1728 +0x92
net/http.(*Server).ListenAndServe(0xc20887ec60, 0x0, 0x0)
    /usr/src/go/src/net/http/server.go:1718 +0x154
github.com/GoogleCloudPlatform/kubernetes/cmd/kube-apiserver/app.(*APIServer).Run(0xc2081f0e00, 0xc20806e0e0, 0x0, 0xe, 0x0, 0x0)
    /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/cmd/kube-apiserver/app/server.go:484 +0x264a
main.main()
        /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/cmd/kube-apiserver/apiserver.go:48 +0x154

Upvotes: 2

Views: 643

Answers (1)

aronchick
aronchick

Reputation: 7128

It's almost certainly that the initial size of machine was too low, and ran out of memory (or something similar). To use a larger cluster size, follow this link[1] and set an environment variable before you bring up your cluster.

In this case, something like:

export MINION_SIZE=t2.large

Should run forever.[2]

[1] http://kubernetes.io/docs/getting-started-guides/aws/

[2] Or reasonable approximation thereof. :)

Upvotes: 1

Related Questions