Kubernetes vs Machine Instance Groups

Question

The goal is to make compute engine feel as much like a serverless architecture as possible.

In my system I'm rendering 60 frames per job, each job requires one CPU/process that takes 20s to complete.

Currently my VM is n1-standard-16 with 1 x NVIDIA Tesla T4. This means about 16 jobs can be ran in parallel per instance taking around 20s per process (most likely less than 16 if I were to do benchmarking)

My goal is to make it easy to boot up as many instances as needed given a dynamic workload. For example, we want to issue 100 jobs (each responsible for 60 frames and will take roughly 20s to complete)

100 (jobs) / 16 (vCPUs) = 6.25 (instances -- lets round to 7).

Im still learning MIGs, but I dont see a way to manage them this way, its based on CPU usage. My question is would it be better to look into k8s for this need, or is there a way to do this inside a MIG?

Sounds like I can use KEDA, am I understanding it correctly?

sqs-trigger-example.yml

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: keda-trigger-auth-aws-credentials
  namespace: keda-test
spec:
  podIdentity:
    provider: aws-kiam # or aws-eks when using IRSA
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: aws-sqs-queue-scaledobject
  namespace: keda-test
spec:
  scaleTargetRef:
    name: hello-world
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-trigger-auth-aws-credentials
    metadata:
      queueURL: myQueue
      queueLength: "1000"
      awsRegion: "eu-west-1"

deolpyment-example.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: load-balancer-example
  name: hello-world
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: load-balancer-example
  template:
    metadata:
      labels:
        app.kubernetes.io/name: load-balancer-example
    spec:
      containers:
        - image: nvidia/cuda/ubuntu image here
          name: test
          ports:
            - containerPort: 3000
          resources:
            limits:
              nvidia.com/gpu: 1

Kubernetes vs Machine Instance Groups

Answers (1)

Related Questions