Reputation: 2304
I've been playing around with a toy webserver I built by putting it under heavy load. I'm finding it performs very well, except for a few outliers. Here's the relevant code:
init() ->
%Gets the listen socket ({active,false}), generates acceptor threads
case gen_tcp:listen(?LISTEN_PORT, ?TCP_OPTS) of
{ok, Listen} ->
?MODULE:gen_accepts(50,Listen)
end,
?MODULE:supervisor_loop(Listen).
supervisor_loop(LS) ->
receive
_ -> ok
after 60000 -> ok
end,
?MODULE:supervisor_loop(LS).
gen_accepts(0,_) -> ok;
gen_accepts(I,LS) ->
spawn(?MODULE,accept_loop,[LS]),
?MODULE:gen_accepts(I-1,LS).
accept_loop(Listen) ->
case gen_tcp:accept(Listen) of
{ok, Sock} ->
spawn(?MODULE,accept_loop,[Listen]),
?MODULE:process_sock(Sock);
{error,_} -> ?MODULE:accept_loop(Listen)
end.
Right now all ?MODULE:process_sock(Sock) does is send some text and close the connection, no IO or anything. When I run apache benchmark (ab) on it, however, about 1 in 5 times I get results like this:
Percentage of the requests served within a certain time (ms)
50% 3
66% 3
75% 4
80% 4
90% 271
95% 271
98% 271
99% 271
100% 271 (longest request)
That was with 20 total requests, with a concurrency level of 20. So basically I made 20 requests at once. As you can see most requests perform in very little time, but one or two are taking a very long time. When I up the load the longest request can be up to 3 seconds, the highest I've seen is 9!
I did some debugging and found that the problem is in the accepting code. I timed how long it took to get from the start of process_sock to the end and found it never varied, but when I moved the starting of the timer to just before gen_tcp:accept then the time difference could be seen. For some reason accept isn't accepting. I've tried upping the number of acceptors initially generated, as well trying different design patterns for spawning process_sock workers, but nothing changes. I should note, right now I'm starting with 50 acceptors, but in the ab output above there were only 20 requests made, so I don't think number of workers is the answer.
I'm running erlang R14B04, if that helps.
Upvotes: 4
Views: 153
Reputation: 2984
Is {backlog,integer()} set to a reasonable number in your ?TCP_OPTS? It defaults to 5 and you could be losing connections at the end if the backlog isn't clearing fast enough.
Upvotes: 3