Reputation: 1
I'm trying to use Poolboy for a worker pool to make a large number of DNS requests. On some of these DNS requests, the DNS query times out, which throws an error and terminates the GenServer worker:
07:44:29.585 [error] GenServer #PID<0.382.0> terminating
** (Socket.Error) timeout
(socket 0.3.13) lib/socket/datagram.ex:46: Socket.Datagram.recv!/2
(dns 2.3.0) lib/dns.ex:76: DNS.query/4
(dmarc_hijack 0.1.0) lib/dmarc.ex:5: Dmarc.fetch_dmarc_record/1
(dmarc_hijack 0.1.0) lib/dmarc_hijack/worker.ex:16: DmarcHijack.Worker.handle_call/3
(stdlib 3.17.1) gen_server.erl:721: :gen_server.try_handle_call/4
(stdlib 3.17.1) gen_server.erl:750: :gen_server.handle_msg/6
(stdlib 3.17.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.717.0>): {:fetch_process_dmarc, "12580.tv"}
State: nil
Client #PID<0.717.0> is dead
Eventually, this leads to all of my Poolboy workers getting killed, and the Supervisor does not appear to restart the Worker GenServers. Application functionality then ceases as there are no more workers, but execution does not halt.
I'm try/catch-ing errors in the Poolboy task as well as the DNS client:
Poolboy task:
defp setup_task(domain) do
Task.async(fn ->
:poolboy.transaction(
:worker,
fn pid ->
try do
GenServer.call(pid, {:fetch_process_dmarc, domain})
catch :exit, reason ->
# Handle timeout
Logger.warning("Probably just got a timeout on #{domain}. Real reason follows:")
Logger.warning(inspect(reason))
{domain, {:error, :timeout}}
end
end,
@timeout
)
end)
end
DNS query code:
defmodule Dmarc do
def fetch_dmarc_record(domain) do
try do
DNS.query("_dmarc.#{domain}", :txt, {select_random_dns_server(), 53})
|> extract_dmarc_record_from_txt()
catch error ->
Logger.error(error)
{:error, :timeout}
end
end
It makes the most sense to me that I should be handling the DNS query timeout at the point of making that DNS query, but it's not getting handled by the try/catch block. I think this is happening because the recv!
call panics on a timeout, bypassing my try/catch block but I could be wrong here.
Based on my understanding, the supervisor should re-start the terminated GenServers but for whatever reason once they terminate from the timeout they are never restarted.
Application config with Supervisor details
defmodule DmarcHijack.Application do
use Application
defp poolboy_config do
[
name: {:local, :worker},
worker_module: DmarcHijack.Worker,
size: 5,
max_overflow: 5
]
end
@impl true
def start(_type, _args) do
children = [
DmarcHijack.ResultsBucket,
:poolboy.child_spec(:worker, poolboy_config())
]
opts = [strategy: :one_for_one, name: DmarcHijack.Supervisor]
Supervisor.start_link(children, opts)
end
end
I'd really appreciate any help available to debug this issue. Thanks!
Upvotes: 0
Views: 131
Reputation: 1
For anyone who's dealing with the same issue that I am, I resolved this issue by doing the following:
catch
with rescue
for the DNS query:infinite
since the timeout is being handled already by DNS.I'm pretty sure this isn't the best solution, but it worked for me.
Upvotes: 0