Reputation: 2876
I have an Elixir/Phoenix running in production and after a while one of the beam.smp processes goes to 100% CPU load (sometimes more than one process). I'm not aware of any trigger causing this. How can I find out what's happening?
EDIT:
I ran iex on the server and connected to the Phoenix node. Than I ran etop and got this output:
Load: cpu 100 Memory: total 69429 binary 10568
procs 303 processes 16656 code 20194
runq 1 atom 727 ets 7205
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
----------------------------------------------------------------------------------------
<19947.645.0> cowboy_protocol:init '-'90164000 88736 0 'Elixir.MyApp.Error
<19947.902.0> cowboy_protocol:init '-'88696000 88744 0 'Elixir.MyApp.Error
<19947.242.0> 'Elixir.Redix.Connec '-' 11697 24704 0 gen_server:loop/6
<19947.240.0> Elixir.Exq '-' 10284 24664 0 gen_server:loop/6
<19947.236.0> Elixir.Exq.Redis.Cli '-' 9597 34520 0 gen_server:loop/6
<19947.1695.0> etop_txt:init/1 '-' 6258 230504 0 etop:update/1
<19947.245.0> Elixir.Exq.Scheduler '-' 4831 24664 0 gen_server:loop/6
<19947.241.0> 'Elixir.Redix.Connec '-' 2339 8856 0 gen_server:loop/6
<19947.426.0> Elixir.MyApp.Presen '-' 262 143160 0 gen_server:loop/6
<19947.238.0> Elixir.Exq.Stats '-' 105 42344 0 gen_server:loop/6
========================================================================================
Those two cowboy_protocol:init
entries causing the problem. But why ... and how can I stop/prevent/debug it?
Upvotes: 1
Views: 1192
Reputation: 11278
Processes started with cowboy_protocol:init
are the processes that handle HTTP requests. The high reduction count would suggests they are stuck in some kind of infinite loop - both processes seem to be executing the same function - there's extremely high chance this function is faulty.
An infinite loop in tail position doesn't consume any additional memory - only CPU. This is very much a feature - and exactly how a GenServer works - an infinite loop in tail position, so the compiler (or runtime) have no way of distinguishing between faulty and correct code that uses this pattern.
This is also very much a tribute to the praised "fault tolerance" of Erlang/Elixir - even though there exists an infinite loop in one branch of the program, the rest functions completely normally, timely responding to requests. Very few platforms are able to do that.
Upvotes: 3