Postgres not starting on swarm server reboot

Question

I'm trying to run an app using docker swarm. The app is designed to be completely local running on a single computer using docker swarm.

If I SSH into the server and run a docker stack deploy everything works, as seen here running docker service ls:

When this deployment works, the services generally go live in this order:

Registry (a private registry)
Main (an Nginx service) and Postgres
All other services in random order (all Node apps)

The problem I am having is on reboot. When I reboot the server, I pretty consistently have the issue of the services failing with this result:

I am getting some errors that could be helpful.

In Postgres: docker service logs APP_NAME_postgres -f:

In Docker logs: sudo journalctl -fu docker.service

Update: June 5th, 2019 Also, By request from a GitHub issue docker version output:

Client:
 Version:           18.09.5
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        e8ff056
 Built:             Thu Apr 11 04:43:57 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.5
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       e8ff056
  Built:            Thu Apr 11 04:10:53 2019
  OS/Arch:          linux/amd64
  Experimental:     false

And docker info output:

Containers: 28
 Running: 9
 Paused: 0
 Stopped: 19
Images: 14
Server Version: 18.09.5
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: pbouae9n1qnezcq2y09m7yn43
 Is Manager: true
 ClusterID: nq9095ldyeq5ydbsqvwpgdw1z
 Managers: 1
 Nodes: 1
 Default Address Pool: 10.0.0.0/8  
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 1
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.0.47
 Manager Addresses:
  192.168.0.47:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.15.0-50-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.68GiB
Name: oeemaster
ID: 76LH:BH65:CFLT:FJOZ:NCZT:VJBM:2T57:UMAL:3PVC:OOXO:EBSZ:OIVH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

And finally, My docker swarm stack/compose file:

secrets:
  jwt-secret:
    external: true
  pg-db:
    external: true
  pg-host:
    external: true
  pg-pass:
    external: true
  pg-user:
    external: true
  ssl_dhparam:
    external: true
services:
  accounts:
    depends_on:
    - postgres
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      JWT_SECRET_FILE: /run/secrets/jwt-secret
      PG_DB_FILE: /run/secrets/pg-db
      PG_HOST_FILE: /run/secrets/pg-host
      PG_PASS_FILE: /run/secrets/pg-pass
      PG_USER_FILE: /run/secrets/pg-user
    image: 127.0.0.1:5000/local-oee-master-accounts:v0.8.0
    secrets:
    - source: jwt-secret
    - source: pg-db
    - source: pg-host
    - source: pg-pass
    - source: pg-user
  graphs:
    depends_on:
    - postgres
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      PG_DB_FILE: /run/secrets/pg-db
      PG_HOST_FILE: /run/secrets/pg-host
      PG_PASS_FILE: /run/secrets/pg-pass
      PG_USER_FILE: /run/secrets/pg-user
    image: 127.0.0.1:5000/local-oee-master-graphs:v0.8.0
    secrets:
    - source: pg-db
    - source: pg-host
    - source: pg-pass
    - source: pg-user
  health:
    depends_on:
    - postgres
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      PG_DB_FILE: /run/secrets/pg-db
      PG_HOST_FILE: /run/secrets/pg-host
      PG_PASS_FILE: /run/secrets/pg-pass
      PG_USER_FILE: /run/secrets/pg-user
    image: 127.0.0.1:5000/local-oee-master-health:v0.8.0
    secrets:
    - source: pg-db
    - source: pg-host
    - source: pg-pass
    - source: pg-user
  live-data:
    depends_on:
    - postgres
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    image: 127.0.0.1:5000/local-oee-master-live-data:v0.8.0
    ports:
    - published: 32000
      target: 80
  main:
    depends_on:
    - accounts
    - graphs
    - health
    - live-data
    - point-logs
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      MAIN_CONFIG_FILE: nginx.local.conf
    image: 127.0.0.1:5000/local-oee-master-nginx:v0.8.0
    ports:
    - published: 80
      target: 80
    - published: 443
      target: 443
  modbus-logger:
    depends_on:
    - point-logs
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      CONTROLLER_ADDRESS: 192.168.2.100
      SERVER_ADDRESS: http://point-logs
    image: 127.0.0.1:5000/local-oee-master-modbus-logger:v0.8.0
  point-logs:
    depends_on:
    - postgres
    - registry
    deploy:
      restart_policy:
        condition: on-failure
    environment:
      ENV_TYPE: local
      PG_DB_FILE: /run/secrets/pg-db
      PG_HOST_FILE: /run/secrets/pg-host
      PG_PASS_FILE: /run/secrets/pg-pass
      PG_USER_FILE: /run/secrets/pg-user
    image: 127.0.0.1:5000/local-oee-master-point-logs:v0.8.0
    secrets:
    - source: pg-db
    - source: pg-host
    - source: pg-pass
    - source: pg-user
  postgres:
    depends_on:
    - registry
    deploy:
      restart_policy:
        condition: on-failure
        window: 120s
    environment:
      POSTGRES_PASSWORD: password
    image: 127.0.0.1:5000/local-oee-master-postgres:v0.8.0
    ports:
    - published: 5432
      target: 5432
    volumes:
    - /media/db_main/postgres_oee_master:/var/lib/postgresql/data:rw
  registry:
    deploy:
      restart_policy:
        condition: on-failure
    image: registry:2
    ports:
    - mode: host
      published: 5000
      target: 5000
    volumes:
    - /mnt/registry:/var/lib/registry:rw
version: '3.2'

Things I've tried

Action: Added restart_policy > window: 120s
- Result: No Effect
Action: Postgres restart_policy > condition: none & crontab @reboot redeploy
- Result: No Effect
Action: Set all containers stop_grace_period: 2m
- Result: No Effect

Current Workaround

Currently, I have hacked together a solution that is working just so I can move on to the next thing. I just wrote a shell script called recreate.sh that will kill the failed first boot version of the server, wait for it to break down, and the "manually" run docker stack deploy again. I am then setting the script to run at boot with crontab @reboot. This is working for shutdowns and reboots, but I don't accept this as the proper answer, so I won't add it as one.

Postgres not starting on swarm server reboot

Answers (1)

Related Questions