Chandra Sekar
Chandra Sekar

Reputation: 763

Redis pod failing

I have redis DB setup running on my minikube cluster. I have shutdown my minikube and started after 3 days and I can see my redis pod is failing to come up with below error from pod log

Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>.

Below is my Stateful Set yaml file for redis master deployed via a helm chart

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Please let me know your suggestions on how can I fix this.

Upvotes: 7

Views: 11509

Answers (2)

Mark
Mark

Reputation: 4067

I am not an Redis expert but from what I can see:

kubectl describe pod red3-redis-master-0
...
Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
...

Means that your appendonly.aof file was corrupted with invalid byte sequences in the middle.

How we can proceed if redis-master is not working?:

  • Verify pvc attached to the redis-master-pod:
kubectl get pvc

NAME                               STATUS   VOLUME                                    
redis-data-red3-redis-master-0     Bound    pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359  
  • Create new redis-client pod wit the same pvc redis-data-red3-redis-master-0:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis-client
spec:
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: redis-data-red3-redis-master-0
  containers:
    - name: redis
      image: docker.io/bitnami/redis:6.2.3-debian-10-r0
      command: ["/bin/bash"]
      args: ["-c", "sleep infinity"]
      volumeMounts:
        - mountPath: "/tmp"
          name: data
EOF
  • Backup your files:
kubectl cp redis-client:/tmp .
  • Repair appendonly.aof file:
kubectl exec -it redis-client -- /bin/bash

cd /tmp

# make copy of appendonly.aof file:
cp appendonly.aof appendonly.aofbackup

# verify appendonly.aof file:
redis-check-aof appendonly.aof

...
0x              38: Expected prefix '*', got: '"'
AOF analyzed: size=62, ok_up_to=56, ok_up_to_line=13, diff=6
AOF is not valid. Use the --fix option to try fixing it.
...

# repair appendonly.aof file:
redis-check-aof --fix appendonly.aof

# compare files using diff:
diff appendonly.aof appendonly.aofbackup

Note:

As per docs:

The best thing to do is to run the redis-check-aof utility, initially without the --fix option, then understand the problem, jump at the given offset in the file, and see if it is possible to manually repair the file: the AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.

In addition as described in the comments by @Miffa Young you can verify where your data is stored using k8s.io/minikube-hostpath provisioner:

kubectl get pv 
...
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      
pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359   8Gi        RWO            Delete           Bound    default/redis-data-red3-redis-master-0     
...

kubectl describe pv pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
...
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
...

Your redis instance is failing down because your appendonly.aof is malformed and stored permanently under this location.

You can ssh into your vm:

minikube -p redis ssh 
cd /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
# from there you can backup/repair/remove your files:

Another solution is to install this chart using new name in this case new set of pv,pvc for redis StatefulSets will be created.

Upvotes: 14

Miffa Young
Miffa Young

Reputation: 117

  • I think your redis is not quit Gracefully , so the AOF file is in a bad format What is AOF

  • you should repair aof file using a initcontainer by command (./redis-check-aof --fix .)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      initContainers:
      - name: repair-redis
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        command: ['sh', '-c', "redis-check-aof --fix  /data/appendonly.aof"]
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem

Upvotes: 1

Related Questions