MrE
MrE

Reputation: 20808

CoreOS etcd with TLS: unit fails but no reason in logs

I am trying to get etcd on my CoreOS cluster setup with TLS... and having a hell of a time.

I looked at the different guides, generated both client and peer certs and keys

etcd fails to start and what I get in journalctl is the following (IPs and token obfuscated):

Dec 16 00:05:12 coreos-123.123.123.123 systemd[1]: Starting etcd2...
-- Subject: Unit etcd2.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit etcd2.service has begun starting up.
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://123.123.123.123:2379
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_CERT_FILE=/etc/ssl/etcd/etcd-client123.123.123.123.cert.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=true
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd2
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_DISCOVERY=https://discovery.etcd.io/xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://123.123.123.123:2380
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_KEY_FILE=/etc/ssl/etcd/private/etcd-client123.123.123.123.key.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379,http://0.0.0.0:4001
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://123.123.123.123:2380,http://123.123.123.123:7001
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_NAME=yyyyyyyyyyyyyyyyyyyyyyyyyy
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_PEER_CERT_FILE=/etc/ssl/etcd/etcd-peer123.123.123.123.cert.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_PEER_CLIENT_CERT_AUTH=true
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_PEER_KEY_FILE=/etc/ssl/etcd/private/etcd-peer123.123.123.123.key.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/certs/ca-chain.cert.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/ssl/certs/ca-chain.cert.pem
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: etcd Version: 2.2.0
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: Git SHA: e4561dd
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: Go Version: go1.4.2
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: Go OS/Arch: linux/amd64
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: setting maximum number of CPUs to 1, total number of available CPUs is 4
Dec 16 00:05:12 coreos-123.123.123.123 etcd2[822]: the server is already initialized as member before, starting as etcd member...
Dec 16 00:05:12 coreos-123.123.123.123 systemd[1]: etcd2.service: Main process exited, code=exited, status=1/FAILURE
Dec 16 00:05:12 coreos-123.123.123.123 systemd[1]: Failed to start etcd2.
-- Subject: Unit etcd2.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit etcd2.service has failed.
--
-- The result is failed.
Dec 16 00:05:12 coreos-123.123.123.123 systemd[1]: etcd2.service: Unit entered failed state.
Dec 16 00:05:12 coreos-123.123.123.123 systemd[1]: etcd2.service: Failed with result 'exit-code'.

I have the certs and keys in the right folders. I'm pretty sure permissions are fine. The certs have extensions for clientAuth,serverAuth (for peer cert) and clientAuth(for client) as well as SAN with the node IP.

Client cert data:

Exponent: 65537 (0x10001)
X509v3 extensions:
    X509v3 Basic Constraints:
        CA:FALSE
    Netscape Cert Type:
        SSL Client, S/MIME
    Netscape Comment:
        OpenSSL Generated Client Certificate
    X509v3 Subject Key Identifier:

    X509v3 Authority Key Identifier:
        keyid:

    X509v3 Key Usage: critical
        Digital Signature, Non Repudiation, Key Encipherment
    X509v3 Extended Key Usage:
        TLS Web Client Authentication, E-mail Protection
    X509v3 Subject Alternative Name:
        IP Address:127.0.0.1, IP Address:123.123.123.123

Peer cert data:

X509v3 extensions:
            X509v3 Basic Constraints:
                CA:FALSE
            Netscape Cert Type:
                SSL Server
            Netscape Comment:
                OpenSSL Generated Server Certificate
            X509v3 Subject Key Identifier:

            X509v3 Authority Key Identifier:
                keyid:

            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Alternative Name:
                IP Address:127.0.0.1, IP Address:123.123.123.123

what else am I missing here? there is nothing in this log to explain the failure.

My goal is to have TLS authentication for clients and peers as it is on the public cloud. PS: it worked fine without TLS. I only added the certs and the 8 TLS flags:

# client flags
    trusted-ca-file: /etc/ssl/certs/ca-chain.cert.pem
    cert-file: /etc/ssl/etcd/etcd-client$public_ipv4.cert.pem
    key-file: /etc/ssl/etcd/private/etcd-client$public_ipv4.key.pem
    client-cert-auth: true

    # peer flags
    peer-trusted-ca-file: /etc/ssl/certs/ca-chain.cert.pem
    peer-cert-file: /etc/ssl/etcd/etcd-peer$public_ipv4.cert.pem
    peer-key-file: /etc/ssl/etcd/private/etcd-peer$public_ipv4.key.pem
    peer-client-cert-auth: true

The $public_ipv4 tag gets translated properly obviously since the IP shows in the logs

I just can't tell what is the problem here since the logs don't say much.

Any idea to point me in the right direction?

Thanks

Upvotes: 0

Views: 290

Answers (1)

Chance Zibolski
Chance Zibolski

Reputation: 63

Due to an upstream systemd bug, journald may miss the last few log lines when its process exit. If journalctl tells you that etcd stops without fatal or panic message, you could try sudo journalctl -f -t etcd2 to get full log.

Once you have the full log it should tell you what etcd is failing on.

Upvotes: 1

Related Questions