Reputation: 435
I am using app engine to serve a bunch of sklearn models. These models are around 100 mb in size, and there are around 25 of them.
Downloading them can take up to 15s at times, despite being in the designated app engine bucket, and is often dominating request times.
I currently use a FIFO cache layer wrapped around the GCS storage client, but cache hits aren't great as the different model are used quite interspersed and app engine memory is limited.
Memcache seems too small for this, and /tmp is also stored in RAM.
Is there a better solution for caching such files?
Upvotes: 1
Views: 156
Reputation: 75715
You can imagine different solution to solve your issue.
Last point, after solving the download issue part, you could have cold start issue: the time that take your server to start and to load in memory your model (at the first request, when the instance start). Cloud Run proposes a min-instance feature to keep warm a certain number of instances and therefore to eliminate the cold start issue.
Upvotes: 2