Reputation: 1
I have been trying to Dockerize Ollama and consequently load the Llama3.1 model into the Google Cloud Run deployment. While Ollama is running as expected in Cloud Run, the model is not loaded as expected since hitting v1/models returns a null result. I have a hacky solution with Compute Engine where I have an SSH connection to run the Dockerized image and consequently to pull and run the model. However, this solution will neither be cost-effective nor efficient in the long term. I want help figuring out how to load LLMs into Ollama through a single Dockerfile that will be deployed to Google Cloud Run if this is possible.
Here is my current Dockerfile.
FROM ollama/ollama
WORKDIR /app
RUN apt-get update && apt-get install -y wget && apt-get install -y --no-install-recommends git curl
ENV DEBIAN_FRONTEND=noninteractive
ENV OLLAMA_KEEP_ALIVE=24h
EXPOSE 11434
VOLUME [ "./ollama/ollama:/root/.ollama" ]
ENTRYPOINT ["/bin/bash", "-c", "ollama serve & sleep 5 && ollama run llama3.1 && tail -f /dev/null"]
After hitting the cloud run URL with v1/models
endpoint, I expect something like this:
{
"object": "list",
"data": [
{
"id": "llama3.1:latest",
"object": "model",
"created": 1724837893,
"owned_by": "library"
}
]
}
Currently, it is just:
{
"object": "list",
"data": null
}
Upvotes: 0
Views: 183