Alechko
Alechko

Reputation: 1588

Debugging why Reconcile triggers on Kubernetes Custom Operator

I've a custom operator that listens to changes in a CRD I've defined in a Kubernetes cluster.

Whenever something changed in the defined custom resource, the custom operator would reconcile and idempotently create a secret (that would be owned by the custom resource).


What I expect is for the operator to Reconcile only when something changed in the custom resource or in the secret owned by it.

What I observe is that for some reason the Reconcile function triggers for every CR on the cluster in strange intervals without observable changes to related entities. I've tried focusing on a specific instance of the CR and follow the times in which Reconcile was called for it. The intervals of these calls are very strange. It seems that the calls are alternating between two series - one starts at 10 hours and diminishes seven minutes at a time. The other starts at 7 minutes and grows by 7 minutes a time.

To demonstrate, Reconcile triggered at these times (give or take a few seconds):

00:00
09:53 (10 hours - 1*7 minute interval)
10:00 (0 hours  + 1*7 minute interval)
19:46 (10 hours - 2*7 minute interval)
20:00 (0 hours  + 2*7 minute interval)
29:39 (10 hours - 3*7 minute interval)
30:00 (0 hours  + 3*7 minute interval)

Whenever the diminishing intervals become less than 7 hours, it resets back to 10 hour intervals. The same with the growing series - as soon as the intervals are higher than 3 hours it resets back to 7 minutes.


My main question is how can I investigating why Reconcile is being triggered?

I'm attaching here the manifests for the CRD, the operator and a sample manifest for a CR:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.4.1
  creationTimestamp: "2021-10-13T11:04:42Z"
  generation: 1
  name: databaseservices.operators.talon.one
  resourceVersion: "245688703"
  uid: 477f8d3e-c19b-43d7-ab59-65198b3c0108
spec:
  conversion:
    strategy: None
  group: operators.talon.one
  names:
    kind: DatabaseService
    listKind: DatabaseServiceList
    plural: databaseservices
    singular: databaseservice
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        description: DatabaseService is the Schema for the databaseservices API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: DatabaseServiceSpec defines the desired state of DatabaseService
            properties:
              cloud:
                type: string
              databaseName:
                description: Foo is an example field of DatabaseService. Edit databaseservice_types.go
                  to remove/update
                type: string
              serviceName:
                type: string
              servicePlan:
                type: string
            required:
            - cloud
            - databaseName
            - serviceName
            - servicePlan
            type: object
          status:
            description: DatabaseServiceStatus defines the observed state of DatabaseService
            type: object
        type: object
    served: true
    storage: true
    subresources:
      status: {}
status:
  acceptedNames:
    kind: DatabaseService
    listKind: DatabaseServiceList
    plural: databaseservices
    singular: databaseservice
  conditions:
  - lastTransitionTime: "2021-10-13T11:04:42Z"
    message: no conflicts found
    reason: NoConflicts
    status: "True"
    type: NamesAccepted
  - lastTransitionTime: "2021-10-13T11:04:42Z"
    message: the initial names have been accepted
    reason: InitialNamesAccepted
    status: "True"
    type: Established
  storedVersions:
  - v1alpha1


----

apiVersion: operators.talon.one/v1alpha1
kind: DatabaseService
metadata:
  creationTimestamp: "2021-10-13T11:14:08Z"
  generation: 1
  labels:
    app: talon
    company: amber
    repo: talon-service
  name: db-service-secret
  namespace: amber
  resourceVersion: "245692590"
  uid: cc369297-6825-4fbf-aa0b-58c24be427b0
spec:
  cloud: google-australia-southeast1
  databaseName: amber
  serviceName: pg-amber
  servicePlan: business-4

----

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "75"
    secret.reloader.stakater.com/reload: db-credentials
    simpledeployer.talon.one/image: <path_to_image>/production:latest
  creationTimestamp: "2020-06-22T09:20:06Z"
  generation: 77
  labels:
    simpledeployer.talon.one/enabled: "true"
  name: db-operator
  namespace: db-operator
  resourceVersion: "245688814"
  uid: 900424cd-b469-11ea-b661-4201ac100014
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: db-operator
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        name: db-operator
    spec:
      containers:
      - command:
        - app/db-operator
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: db-operator
        - name: AIVEN_PASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: db-credentials
        - name: AIVEN_PROJECT
          valueFrom:
            secretKeyRef:
              key: projectname
              name: db-credentials
        - name: AIVEN_USERNAME
          valueFrom:
            secretKeyRef:
              key: username
              name: db-credentials
        - name: SENTRY_URL
          valueFrom:
            secretKeyRef:
              key: sentry_url
              name: db-credentials
        - name: ROTATION_INTERVAL
          value: monthly
        image: <path_to_image>/production@sha256:<some_sha>
        imagePullPolicy: Always
        name: db-operator
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: db-operator
      serviceAccountName: db-operator
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2020-06-22T09:20:06Z"
    lastUpdateTime: "2021-09-07T11:56:07Z"
    message: ReplicaSet "db-operator-cb6556b76" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  - lastTransitionTime: "2021-09-12T03:56:19Z"
    lastUpdateTime: "2021-09-12T03:56:19Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 77
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Note:

return ctrl.Result{Requeue: false, RequeueAfter: 0}

So that shouldn't be the reason for the repeated triggers.

Upvotes: 1

Views: 2354

Answers (2)

feng hong
feng hong

Reputation: 29

same problem my Reconcile triggered at these times

00:00
09:03 (9 hours + 3 min)
18:06 (9 hours + 3 min)
00:09 (9 hours + 3 min)

sync period is not set so it should be default. kubernetes 1.20.11 version

Upvotes: 0

Ashish Raman
Ashish Raman

Reputation: 41

This would require more info on how your controller is set up. For example what is the sync period you have set. This could be due to default sync period set which reconciles all the objects at given interval of time.

SyncPeriod determines the minimum frequency at which watched resources are reconciled. A lower period will correct entropy more quickly, but reduce responsiveness to change if there are many watched resources. Change this value only if you know what you are doing. Defaults to 10 hours if unset. there will a 10 percent jitter between the SyncPeriod of all controllers so that all controllers will not send list requests simultaneously.

For more information check this: https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.2/pkg/manager/manager.go#L134

Upvotes: 1

Related Questions