user2577226
user2577226

Reputation: 123

Best practice for testing supervisors in Elixir

I've googled around a fair bit and am unable to find anything on the subject - either Elixir is too young a language or I'm searching with the wrong terms.

I'm working through Jose Valim's Elixir Portal tutorial (https://howistart.org/posts/elixir/1) and am building tests for practice (I've built all the functionality).

Part of the tutorial is building a Supervisor to make the Portal.Door module fault tolerant.

I'm trying to test the fault tolerance (e.g. that the Supervisor restarts the Portal.Door instance if it is improperly shutdown) using the following code

defmodule PortalTest do
  use ExUnit.Case, async: true

  ...

  test "supervisor restarts doors" do 
    {:ok, pid} = Portal.shoot(:third)
    Process.unlink(pid)
    Process.exit(pid, :shutdown)
    assert Portal.Door.get(:third) == [] #new doors initialize with an empty array
  end

end

But I keep getting this error when I run the test:

  1) test supervisor restarts doors (PortalTest)
     test/portal_test.exs:35
     ** (exit) exited in: GenServer.call(:third, {:get, #Function<3.47016826/1 in Portal.Door.get/1>}, 5000)
         ** (EXIT) shutdown
     stacktrace:
       (elixir) lib/gen_server.ex:356: GenServer.call/3
       test/portal_test.exs:39

So, I'm wondering if there's a better way to do this or my code is simply bad.

Upvotes: 10

Views: 3295

Answers (2)

hunmonk
hunmonk

Reputation: 106

Here's a working code example, based largely on the tips @sasajuric provided.

defmodule Namer.Worker.Test do
  use ExUnit.Case

  test "supervisor restarts worker on server crash" do
    pid = Process.whereis(Namer.Worker)
    ref = Process.monitor(pid)
    Process.exit(pid, :kill)
    receive do
      {:DOWN, ^ref, :process, ^pid, :killed} ->
        :timer.sleep 1
        assert is_pid(Process.whereis(Namer.Worker))
    after
      1000 ->
        raise :timeout
    end
  end
end

Upvotes: 3

sasajuric
sasajuric

Reputation: 6059

Process.exit/1 sends an exit signal but doesn't wait for the process to stop. Judging by your error output, it looks like Portal.Door.get/1 then fails, since the gen_server process terminates before it receives the call message.

To overcome this, you need to wait for the process to shutdown, and to be restarted again. A simple remedy might be a brief sleep (say 100ms) via :timer.sleep/1 after you issue an exit signal.

A more involved approach is to wait for the process to terminate, and then to be restarted again. The first part can be easily done by setting up a monitor (via Process.monitor/1) and wait for the corresponding :DOWN message. By doing this, you also verify that the target process has indeed terminated.

Then you need to wait for the process to be restarted again, so you can issue a request. This can be tricky, and sleeping for a brief time is probably the easiest option. Alternatively, if the process is registered under a local alias, you could poll with Process.whereis/1 until you get a non-nil value, at which point you know that the process runs again.

Upvotes: 5

Related Questions