Philip Claren
Philip Claren

Reputation: 2876

Production Elixir/Phoenix app hogs CPU

I have an Elixir/Phoenix running in production and after a while one of the beam.smp processes goes to 100% CPU load (sometimes more than one process). I'm not aware of any trigger causing this. How can I find out what's happening?

EDIT:

I ran iex on the server and connected to the Phoenix node. Than I ran etop and got this output:

Load:  cpu       100               Memory:  total       69429    binary      10568
        procs     303                        processes   16656    code        20194
        runq        1                        atom          727    ets          7205
Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current Function
----------------------------------------------------------------------------------------
<19947.645.0>  cowboy_protocol:init     '-'90164000   88736       0 'Elixir.MyApp.Error
<19947.902.0>  cowboy_protocol:init     '-'88696000   88744       0 'Elixir.MyApp.Error
<19947.242.0>  'Elixir.Redix.Connec     '-'   11697   24704       0 gen_server:loop/6
<19947.240.0>  Elixir.Exq               '-'   10284   24664       0 gen_server:loop/6
<19947.236.0>  Elixir.Exq.Redis.Cli     '-'    9597   34520       0 gen_server:loop/6
<19947.1695.0> etop_txt:init/1          '-'    6258  230504       0 etop:update/1
<19947.245.0>  Elixir.Exq.Scheduler     '-'    4831   24664       0 gen_server:loop/6
<19947.241.0>  'Elixir.Redix.Connec     '-'    2339    8856       0 gen_server:loop/6
<19947.426.0>  Elixir.MyApp.Presen      '-'     262  143160       0 gen_server:loop/6
<19947.238.0>  Elixir.Exq.Stats         '-'     105   42344       0 gen_server:loop/6
========================================================================================

Those two cowboy_protocol:initentries causing the problem. But why ... and how can I stop/prevent/debug it?

Upvotes: 1

Views: 1192

Answers (1)

michalmuskala
michalmuskala

Reputation: 11278

Processes started with cowboy_protocol:init are the processes that handle HTTP requests. The high reduction count would suggests they are stuck in some kind of infinite loop - both processes seem to be executing the same function - there's extremely high chance this function is faulty.

An infinite loop in tail position doesn't consume any additional memory - only CPU. This is very much a feature - and exactly how a GenServer works - an infinite loop in tail position, so the compiler (or runtime) have no way of distinguishing between faulty and correct code that uses this pattern.

This is also very much a tribute to the praised "fault tolerance" of Erlang/Elixir - even though there exists an infinite loop in one branch of the program, the rest functions completely normally, timely responding to requests. Very few platforms are able to do that.

Upvotes: 3

Related Questions