Reputation: 13402
The goal is to make compute engine feel as much like a serverless architecture as possible.
In my system I'm rendering 60 frames per job, each job requires one CPU/process that takes 20s to complete.
Currently my VM is n1-standard-16
with 1 x NVIDIA Tesla T4
. This means about 16 jobs can be ran in parallel per instance taking around 20s per process (most likely less than 16 if I were to do benchmarking)
My goal is to make it easy to boot up as many instances as needed given a dynamic workload. For example, we want to issue 100 jobs (each responsible for 60 frames and will take roughly 20s to complete)
100 (jobs) / 16 (vCPUs) = 6.25 (instances -- lets round to 7).
Im still learning MIGs, but I dont see a way to manage them this way, its based on CPU usage. My question is would it be better to look into k8s for this need, or is there a way to do this inside a MIG?
Sounds like I can use KEDA, am I understanding it correctly?
sqs-trigger-example.yml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-aws-credentials
namespace: keda-test
spec:
podIdentity:
provider: aws-kiam # or aws-eks when using IRSA
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: keda-test
spec:
scaleTargetRef:
name: hello-world
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-auth-aws-credentials
metadata:
queueURL: myQueue
queueLength: "1000"
awsRegion: "eu-west-1"
deolpyment-example.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: load-balancer-example
name: hello-world
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: load-balancer-example
template:
metadata:
labels:
app.kubernetes.io/name: load-balancer-example
spec:
containers:
- image: nvidia/cuda/ubuntu image here
name: test
ports:
- containerPort: 3000
resources:
limits:
nvidia.com/gpu: 1
Upvotes: 0
Views: 436
Reputation: 61521
Yes. There are multiple ways. One that I can think of is using a pod for each job. Each pod has the resources that your job requires and the pods can be distributed among different Kubernetes nodes that have your hardware requirements.
Now the typical way of autoscaling is using CPU and memory, but you can also autoscale based on specific metrics. For that I would recommend taking a look KEDA to autoscale your jobs based on some specific metric (i.e number of jobs waiting in a queue). KEDA also has the ScaledJob resource that you can use in your case.
P.S. Maybe have other ways of scaling other than CPU, you may want to check with the Google App Engine support.
Upvotes: 1