Rails consumption of external API requiring staggered consumption

Question

I am using an external service to perform a search for my application.

The results of this search need to be collected from multiple partners and take between 10 and 90 seconds to complete. While results are being collected I am repeatedly polling my search session to collect the results that have already been prepared.

As and when I have new results I am shifting these up to the client via SSE.

I am polling every 5 seconds or so.

How should I be running this process without absolutely nuking one of my threads for 90 seconds (running puma + nginx). I need to maintain my controller's state to push the SSEs to the requesting client and am unsure of the best way of dealing with the delays between polls.

much appreciated

dre-hh · Accepted Answer

You have to give up on SSEs, if you really want to release the threads. In order to receive SSEs a browser maintains a long-living connection to the webserver and in case of puma, each client connection will be handled by a separate thread.

However, if you just want to do polling of partial results you can use the following strategy:

Start background search job with e.g sidekiq
Cache partial results for each search request within an in-memory store like redis
Poll results from redis

Another option might be moving the messaging problem to an evented server. Evented Servers will not spawn a separate thread on each connection, no matter long-lived or not.

One such evented server, which perfectly integrates with rails, is Faye. Procedure will be:

Client subscribes on Faye message channel
client intiates seach
search is performed within background job(sidekiq)
background job periodically publishes partial results on same Faye channel

Actually the puma multithreaded setup intends to keep you from going through all of this. I would just increase the number threads and processes as far as your system allows and see how that performs. Adding more ram or some extra servers is allways cheaper and allows you to focus on another features.

Messaging with Faye

Edit 1 Rethinking about what would actually be the benefit of moving the search in a background job. Sidekiq also has it's own thread pool and a sidekiq thread does not differ from a puma thread. The search task has to be done anyway. Its threads will be suspended most of the time, wating for IO. So, the only benefit of the above 2 solutions is proper resource balancing. It allows you to define how many threads will be used for the search job and how many for your app server. So, how about following strategy:

Deploy the app twice on same or different machines
Configure nginx routing/loadbalancing for search queries with SSE to one app instance
Configure serving the rest of the app to the second instance
Have not a single thing of your app logic changed
PROFIT

You can even abandon polling completly and just stick to SSEs

Rails consumption of external API requiring staggered consumption

Answers (1)

Related Questions