Halid
Halid

Reputation: 51

Improve erlang cowboy performance

We have been using Cowboy in production on our Compute Engine machines on GCP and we started benchmarking and improving the performance of our service to handle more Reqs/sec (in our case since we are in Adtech it is bids/sec).

After isolating and handling a lot of the issues separately we came down to Cowboy optimization, these are our current findings and limitations:

Cowboy setup

We are using Cowboy 2.5 with 200 acceptors and max backlog of 1024

init(Req, _State) ->
    T1 = erlang:monotonic_time(),
    {ok, BRjson, _} = cowboy_req:read_body(Req),
    %% ---- rest of work goes here but is switched off for our test---
    erlang:send_after(60, self(), {'RSP', x, no_workers}),
    {cowboy_loop, Req, #state{t1 = T1}, hibernate}.

Erlang VM

OTP 21

VM args: -smp auto +P 134217727 +K true +A 64 -rate 1200 +stbt db +scl false +sfwi 500 +spp true +zdbbl 8092

Load

Json requests ~4KB in size. And testing is done using a separate machine on the same internal network (no SSL) using jmeter. All requests are POST with keep-alive

Servers

GCP Compute Engine 10 vcpu cores and 14GB RAM (now and tested before with the 4 vcpu)

Findings

We are able to reach to ~1900 reqs/sec but all CPU cores in htop are showing almost 80% utilization

At 1000 reqs/sec we se cpu utilization at 45-50% per core (still high bearing in mind that no other part of our application is running)

*Note: using the 4 vcpu machine we were able to get close to 700 reqs/sec and memory in all of our tests is barely utilizied or changing with load


QUESTION: How to improve cowboy's performance in terms of cpu usage?

Upvotes: 4

Views: 964

Answers (1)

Halid
Halid

Reputation: 51

First off, thanks @Pouriya for suggestions--actually, discussing this back and forth made me go back and re-check one of my comments about the right tool for the job. PS: we are on GCP so 72 cores would be out of question at this stage.

Cowboy is great! but it does add a bit of overhead in the critical path of each request--a feature (or issue in my case) that is not needed.

We tested again with Elli (https://github.com/elli-lib/elli) but built a proper testing setup this time and it provided improvement up to 20% ~ exactly what we needed!

If anyone at Cowboy/Ranch team has a way of drastically improving CPU overhead will gladly test since we still use it in our APIs but not the critical path.

Upvotes: 1

Related Questions