David Eyk
David Eyk

Reputation: 12521

Introspection into forever-running salt highstate?

I've been experimenting with Salt, and I've managed to lock up my highstate command. It's been running for hours now, even though there's nothing that warrants that kind of time.

The last change I made was to modify the service.watch state for nginx. It currently reads:

nginx:
  pkg.installed:
    - name: nginx
  service:
    - running
    - enable: True
    - restart: True
    - watch:
      - file: /etc/nginx/nginx.conf
      - file: /etc/nginx/sites-available/default.conf
      - pkg: nginx

The last change I made was to add the second file: argument to watch.

After letting it run all night, with no change in state, I subsequently Ctrl-C'd the process. The last output from sudo salt -v 'web*' state.highstate -l debug was:

[DEBUG   ] Checking whether jid 20140403022217881027 is still running
[DEBUG   ] get_returns for jid 20140403103702550977 sent to set(['web1.mysite.com']) will timeout at 10:37:04
[DEBUG   ] jid 20140403103702550977 found all minions
Execution is still running on web1.mysite.com
^CExiting on Ctrl-C
This job's jid is:                                                                                                                                     
20140403022217881027
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later run:
salt-run jobs.lookup_jid 20140403022217881027

Running it again immediately, I got this:

$ sudo salt -v 'web*' state.highstate -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Missing configuration file: /home/eykd/.salt
[DEBUG   ] Configuration file path: /etc/salt/master
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Missing configuration file: /home/eykd/.salt
[DEBUG   ] LocalClientEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LocalClientEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
Executing job with jid 20140403103715454952
-------------------------------------------

[DEBUG   ] Checking whether jid 20140403103715454952 is still running
[DEBUG   ] get_returns for jid 20140403103720479720 sent to set(['web1.praycontinue.com']) will timeout at 10:37:22
[INFO    ] jid 20140403103720479720 minions set(['web1.mysite.com']) did not return in time
[DEBUG   ] Loaded no_out as virtual quiet
[DEBUG   ] Loaded json_out as virtual json
[DEBUG   ] Loaded yaml_out as virtual yaml
[DEBUG   ] Loaded pprint_out as virtual pprint
web1.praycontinue.com:
    Minion did not return

I then ran the same command, and received this:

$ sudo salt -v 'web*' state.highstate -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Missing configuration file: /home/eykd/.salt
[DEBUG   ] Configuration file path: /etc/salt/master
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Missing configuration file: /home/eykd/.salt
[DEBUG   ] LocalClientEvent PUB socket URI: ipc:///var/run/salt/master/master_event_pub.ipc
[DEBUG   ] LocalClientEvent PULL socket URI: ipc:///var/run/salt/master/master_event_pull.ipc
Executing job with jid 20140403103729848942
-------------------------------------------

[DEBUG   ] Loaded no_out as virtual quiet
[DEBUG   ] Loaded json_out as virtual json
[DEBUG   ] Loaded yaml_out as virtual yaml
[DEBUG   ] Loaded pprint_out as virtual pprint
web1.mysite.com:
    Data failed to compile:
----------
    The function "state.highstate" is running as PID 4417 and was started at 2014, Apr 03 02:22:17.881027 with jid 20140403022217881027

There is no process running under PID 4417. Running sudo salt-run jobs.lookup_jid 20140403022217881027 displays nothing.

Unfortunately, I can't connect to the minion via ssh, as salt hasn't provisioned my authorized_keys yet. :\

So, to my question: what the heck is wrong, and how in the world do I find that out?

Upvotes: 2

Views: 2525

Answers (2)

Lucas Meadows
Lucas Meadows

Reputation: 1

I had this happen when I aborted a run of state.highstate on the salt-master with Ctrl-C. It turned out that the PID referenced in the error message was actually the PID of a salt-minion process on the minion machine.

I was able to resolve the problem by restarted the salt-minion process on the minion, and then re-executing state.highstate on the master.

Upvotes: 0

David Eyk
David Eyk

Reputation: 12521

So, after a lot of debugging, this was a result of an improperly configured Nginx service. service nginx start was hanging, and thus so was salt-minion.

Upvotes: 1

Related Questions