Sidekiq starting successfully, but systemd restarts every ~1 minute anyway

Rails: 6.0.3

Sidekiq: 6.1.2

Ruby 2.7.2

Running on AWS Amazon Linux 2

I'm running a fairly simply Sidekiq configuration on production, and using the boilerplate systemd/sidekiq.service file from the examples directory in the sidekiq repo.

I noticed that my workers can not run long jobs because they are killed every 1 minute or so. I was able to track down what's happening, and it appears that systemd is restarting sidekiq, even though it is successfully started. It appears that it never receives the message that the service started successfully, so systemd is killing the process.

Here are the logs:

sidekiq: 2021-06-01T23:30:56.510Z pid=24939 tid=gir INFO: Shutting down
sidekiq: 2021-06-01T23:30:56.511Z pid=24939 tid=4jxb INFO: Scheduler exiting...
systemd: Failed to start sidekiq.
systemd: Unit sidekiq.service entered failed state.
systemd: sidekiq.service failed.
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=gir INFO: Terminating quiet workers
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=4jvn INFO: Scheduler exiting...
sidekiq: 2021-06-01T23:30:57.015Z pid=24939 tid=gir INFO: Pausing to allow workers to finish...
sidekiq: 2021-06-01T23:30:57.516Z pid=24939 tid=gir INFO: Bye!
systemd: sidekiq.service holdoff time over, scheduling restart.
systemd: Starting sidekiq...
sidekiq: 2021-06-01T23:30:58.991Z pid=32046 tid=fs6 INFO: Enabling systemd notification integration
sidekiq: 2021-06-01T23:31:04.475Z pid=32046 tid=fs6 INFO: Booting Sidekiq 6.1.2 with redis options {:url=>"redis://******"}
sidekiq: 2021-06-01T23:31:08.869Z pid=32046 tid=fs6 INFO: Running in ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
sidekiq: 2021-06-01T23:31:08.870Z pid=32046 tid=fs6 INFO: See LICENSE and the LGPL-3.0 for licensing details.
systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981

Following these messages, the sidekiq worker will successfully perform the jobs from the queue for about 1 minute before it's restarted again. This cycle continues forever.

I've tried modifying the sidekiq.service file a number of different ways, but nothing seems to do the trick. In particular, this line from the logs seems to indicate there's an issue sending the signal to the right process ID, that sidekiq correctly started up: systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981

Any ideas on how I can ensure that systemd accurately knows when a job succeeds/fails to start?

Here is my current systemd/sidekiq.service file:

#
# This file tells systemd how to run Sidekiq as a 24/7 long-running daemon.
#
# Customize this file based on your bundler location, app directory, etc.
# Customize and copy this into /usr/lib/systemd/system (CentOS) or /lib/systemd/system (Ubuntu).
# Then run:
#   - systemctl enable sidekiq
#   - systemctl {start,stop,restart} sidekiq
#
# This file corresponds to a single Sidekiq process.  Add multiple copies
# to run multiple processes (sidekiq-1, sidekiq-2, etc).
#
# Use `journalctl -u sidekiq -rn 100` to view the last 100 lines of log output.
#
[Unit]
Description=sidekiq
# start us only once the network and logging subsystems are available,
# consider adding redis-server.service if Redis is local and systemd-managed.
After=syslog.target network.target

# See these pages for lots of options:
#
#   https://www.freedesktop.org/software/systemd/man/systemd.service.html
#   https://www.freedesktop.org/software/systemd/man/systemd.exec.html
#
# THOSE PAGES ARE CRITICAL FOR ANY LINUX DEVOPS WORK; read them multiple
# times! systemd is a critical tool for all developers to know and understand.
#
[Service]
#
#      !!!!  !!!!  !!!!
#
# As of v6.0.6, Sidekiq automatically supports systemd's `Type=notify` and watchdog service
# monitoring. If you are using an earlier version of Sidekiq, change this to `Type=simple`
# and remove the `WatchdogSec` line.
#
#      !!!!  !!!!  !!!!
#
Type=simple
# If your Sidekiq process locks up, systemd's watchdog will restart it within seconds.
#WatchdogSec=10

EnvironmentFile=/opt/elasticbeanstalk/deployment/custom_env_var

WorkingDirectory=/var/app/current
# If you use rbenv:
# ExecStart=/bin/bash -lc 'exec /home/deploy/.rbenv/shims/bundle exec sidekiq -e production'
# If you use the system's ruby:
# ExecStart=/usr/local/bin/bundle exec sidekiq -e production
# If you use rvm in production without gemset and your ruby version is 2.6.5
# ExecStart=/home/deploy/.rvm/gems/ruby-2.6.5/wrappers/bundle exec sidekiq -e production
# If you use rvm in production wit gemset and your ruby version is 2.6.5
ExecStart=/bin/bash -lc 'cd /var/app/current; bundle exec sidekiq -e production -r /var/app/current -C /var/app/current/config/sidekiq.yml'

# Use `systemctl kill -s TSTP sidekiq` to quiet the Sidekiq process

# !!! Change this to your deploy user account !!!
User=root
Group=root
UMask=0002

# Greatly reduce Ruby memory fragmentation and heap usage
# https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
Environment=MALLOC_ARENA_MAX=2

# if we crash, restart
RestartSec=1
Restart=on-failure

# output goes to /var/log/syslog (Ubuntu) or /var/log/messages (CentOS)
StandardOutput=syslog
StandardError=syslog

# This will default to "bundler" if we don't specify it
SyslogIdentifier=sidekiq

[Install]
WantedBy=multi-user.target

Upvotes: 2

Answers (3)

murb

Reputation: 1860

I ran into the same issue, tried different configurations, but for me the root problem lied with the systemd config. I am runnnig in user mode. But typically services ran as user are stopped when the user session is ended. For me the answer was:

loginctl enable-linger username

This is how I got to that. I ran sudo journalctl and saw messages like:

13:07:23  systemd[1]: Started [email protected] - User Manager for UID 1002.
13:07:23  systemd[29642]: Started sidekiq.service - sidekiq (production).
13:07:23  systemd[29642]: Reached target default.target - Main User Target.
13:07:23  systemd[29642]: Startup finished in 177ms.
13:07:23  systemd[1]: Started session-83.scope - Session 83 of User appuser.
13:07:23  sshd[29639]: pam_env(sshd:session): deprecated reading of user environment enabled
13:07:24  sshd[29639]: pam_unix(sshd:session): session closed for user appuser
13:07:24  systemd[1]: session-83.scope: Deactivated successfully.
13:07:24  systemd[1]: session-83.scope: Consumed 1.132s CPU time.
13:07:24  systemd-logind[549]: Session 83 logged out. Waiting for processes to exit.
13:07:24  systemd-logind[549]: Removed session 83.
13:07:34  sshd[30228]: Accepted publickey for appuser from xxx.xxx.xxx.xxx port 58500 ssh2: RSA SHA256:xxxxxxxx
13:07:34  sshd[30228]: pam_unix(sshd:session): session opened for user appuser(uid=1002) by (uid=0)
13:07:34  systemd-logind[549]: New session 85 of user appuser.
13:07:34  systemd[1]: Started session-85.scope - Session 85 of User appuser.
13:07:35  sshd[30228]: pam_unix(sshd:session): session closed for user appuser
13:07:35  systemd[1]: session-85.scope: Deactivated successfully.
13:07:35  systemd[1]: session-85.scope: Consumed 1.376s CPU time.
13:07:35  systemd-logind[549]: Session 85 logged out. Waiting for processes to exit.
13:07:35  systemd-logind[549]: Removed session 85.
13:07:45  systemd[1]: Stopping [email protected] - User Manager for UID 1002...
13:07:45  systemd[29642]: Activating special unit exit.target...
13:07:45  systemd[29642]: Stopped target default.target - Main User Target.
13:07:45  systemd[29642]: Stopping sidekiq.service - sidekiq (production)...
13:07:46  systemd[29642]: Stopped sidekiq.service - sidekiq (production).
13:07:46  systemd[29642]: sidekiq.service: Consumed 5.123s CPU time.
13:07:46  systemd[29642]: Stopped target basic.target - Basic System.
13:07:46  systemd[29642]: Stopped target paths.target - Paths.
13:07:46  systemd[29642]: Stopped target sockets.target - Sockets.

(here I am from my own machine pinging the status of sidekiq; bringing it quickly alive until it dies)

Here I saw the session killing a user service. Killing subsequent services ran by that user. And then I remembered to enable lingering again... triggered also by the comments here: https://github.com/systemd/systemd/issues/8486#issuecomment-374502122

Upvotes: 0

pbacterio

Reputation: 1152

Maybe this work in your case:

Type=notify
Notify=all  # or "exec"

Upvotes: 0

Mike Perham

Reputation: 22208

Change ExecStart to:

ExecStart=/direct/path/to/bundle exec sidekiq -e production

Everything else in that line appears superfluous.

Upvotes: 0

Sidekiq starting successfully, but systemd restarts every ~1 minute anyway

Answers (3)

Related Questions