jdm
jdm

Reputation: 10130

Limit FastAPI/gunicorn/... worker to certain endpoints to save memory

I have a FastAPI application with multiple endpoints, and each endpoint uses certain memory intensive objects (ML models). This works fine when I only have one worker, but I am worried about memory usage (and to a lesser extent startup time) when I scale to multiple workers.

Is there a way to limit certain workers to certain endpoints only? Then I would only load the objects required for the respective endpoint.

Specifically, assume I have two endpoints using 2 GB each. If I scale to four workers, I need 2 GB x 2 x 4 = 16 GB.

If I say the first two workers only serve the first endpoint, and the second two workers serve the second endpoint, every process only needs to load one of the models! So I would have 2 GB x 4 = 8 GB. This assumes of course that the load is approximately equal, which is the case here.

Alternatives:

Upvotes: 3

Views: 31

Answers (0)

Related Questions