Easterwood
Easterwood

Reputation: 663

Is a ruby_block executed repeatedly when retries is above 0 and ignore_failure is true?

There is a chef recipe with a ruby-block. The ruby-block is executed until a socket connection can be established (retries 10). In case no connection can be established the ruby-block should not fail (ignore_failure).

Example:

ruby_block 'wait for service' do
  block do
    require 'socket'
    require 'timeout'
    Timeout.timeout(2) do
      s = TCPSocket.new('127.0.0.1', 8080)
      s.close
    end
  end
  retries 10
  retry_delay 5
  ignore_failure true
  action :run
end

The chef documentation isn't clear about whether the ruby-block is executed repeatedly or not when the ignore_failure is set to true.

Update

When the script is executed and no service is listening on port 8080 the execution of the chef recipe continues after the first attempt with the following message:

ERROR: ruby_block[wait for service] (cookbook::wait_for_service line 1) had an error: Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 8080; ignore_failure is set, continuing

...

Error executing action run on resource 'ruby_block[wait for service]'

Errno::ECONNREFUSED
-------------------
Connection refused - connect(2) for "127.0.0.1" port 8080

...

Due to the ruby_block declaration I would thing that the ruby is executed 10 times before reporting an ERROR.

Upvotes: 1

Views: 684

Answers (2)

xleon90
xleon90

Reputation: 1316

I've tested your scenario with Chef version 12.19.36 and really it happens that if both ignore_failure and retries are specified only ignore_failure is applied while retries is ignored.

Also here Chef documentation isn't clear about this specific scenario and so it is not possible to solve your issue doing that.

Anyway you can solve manually implementing the retries and retry_delay logic as follow:

ruby_block 'wait for service' do
      block do
        require 'socket'
        require 'timeout'

        retry_delay = 5
        retries = 10

        1.upto(retries) do |n|
          err_msg = ""
          begin
            Timeout::timeout(retry_delay) do
              begin
                s = TCPSocket.new('8.8.8.8', 52)
                s.close
                puts("Service is listening on")
                break
              rescue Errno::ECONNREFUSED
                err_msg = "Port is open but no service is listening on"
              rescue Errno::EHOSTUNREACH
                err_msg =  "Unable to connect to the service"
              end
            end
          rescue Timeout::Error
            err_msg = "Timeout reached"
          end

          if n == retries
            raise "Unabled to reach server in #{retries} attempts"
          else
            puts "Failed to reach server on attempt [#{n}/#{retries}]. Cause is: [#{err_msg}]. Waiting #{retry_delay} seconds and retry."
            sleep(retry_delay)
          end

        end
      end
      ignore_failure true
      action :run
    end

You can also improve the code creating a common function execute_with_retry with a lambda function as input in order to simply reuse this logic on your recipes when needed.

Upvotes: 1

Easterwood
Easterwood

Reputation: 663

My test proofs that the retries have no effect. The block is executed only once because the socket connection fails.

Upvotes: 1

Related Questions