zhwlx22
zhwlx22

Reputation: 171

Opening storage failed" err="invalid block sequence"

What did you do?

 I ran prometheus2.0.0 on kubernetesv1.8.5

What did you expect to see?

 Everything went well. 

What did you see instead? Under which circumstances?

Everything went well at beginning. But several hours later, pods' statuses turned to "CrashLoopBackOff", all prometheus turned unavaliable. After prometheus pods created, nothing has been done.

[root@k8s-1 prometheus]# kubectl get all -n monitoring
NAME                          DESIRED   CURRENT   AGE
statefulsets/prometheus-k8s   0         2         16h

NAME                  READY     STATUS             RESTARTS   AGE
po/prometheus-k8s-0   0/1       CrashLoopBackOff   81         16h
po/prometheus-k8s-1   0/1       CrashLoopBackOff   22         16h

Environment

[root@k8s-1 prometheus]# kubectl version --short
Client Version: v1.8.5
Server Version: v1.8.5


[root@k8s-1 prometheus]# docker images | grep -i prometheus
quay.io/prometheus/alertmanager                          v0.12.0             f87cbd5f1360        5 weeks ago         31.2 MB
quay.io/prometheus/node_exporter                         v0.15.2             ff5ecdcfc4a2        6 weeks ago         22.8 MB
quay.io/prometheus/prometheus                            v2.0.0              67141fa03496        2 months ago        80.2 MB

prometheus.yaml:

    [root@k8s-1 prometheus]# cat prometheus-all-together.yaml
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        prometheus: k8s
      name: prometheus-k8s
      namespace: monitoring
      annotations:
        prometheus.io/scrape: "true"
    spec:
      ports:
      - name: web
        nodePort: 30900
        port: 9090
        protocol: TCP
        targetPort: web
      selector:
        prometheus: k8s
      sessionAffinity: None
      type: NodePort
    ---
    apiVersion: apps/v1beta1
    kind: StatefulSet
    metadata:
      labels:
        prometheus: k8s
      name: prometheus-k8s
      namespace: monitoring
    spec:
      selector:
        matchLabels:
          app: prometheus
          prometheus: k8s
      serviceName: prometheus-k8s
      replicas: 2
      template:
        metadata:
          labels:
            app: prometheus
            prometheus: k8s
        spec:
          securityContext:
            runAsUser: 65534
            fsGroup: 65534
            runAsNonRoot: true
          containers:
          - args:
            - --config.file=/etc/prometheus/config/prometheus.yaml
            - --storage.tsdb.path=/cephfs/prometheus/data
            - --storage.tsdb.retention=180d
            - --web.route-prefix=/
            - --web.enable-lifecycle
            - --web.enable-admin-api
            image: quay.io/prometheus/prometheus:v2.0.0
            imagePullPolicy: IfNotPresent
            livenessProbe:
              failureThreshold: 10
              httpGet:
                path: /status
                port: web
                scheme: HTTP
              initialDelaySeconds: 30
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 3
            name: prometheus
            ports:
            - containerPort: 9090
              name: web
              protocol: TCP
            readinessProbe:
              failureThreshold: 6
              httpGet:
                path: /status
                port: web
                scheme: HTTP
              periodSeconds: 5
              successThreshold: 1
              timeoutSeconds: 3
            resources:
              requests:
                cpu: 100m
                memory: 200Mi
              limits:
                cpu: 500m
                memory: 500Mi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /etc/prometheus/config
              name: config
              readOnly: false
            - mountPath: /etc/prometheus/rules
              name: rules
              readOnly: false
            - mountPath: /cephfs/prometheus/data
              name: data
              subPath: prometheus-data
              readOnly: false
          serviceAccount: prometheus-k8s
          serviceAccountName: prometheus-k8s
          terminationGracePeriodSeconds: 60
          volumes:
          - configMap:
              defaultMode: 511
              name: prometheus-k8s-config
            name: config
          - configMap:
              defaultMode: 511
              name: prometheus-k8s-rules
            name: rules
          - name: data
            persistentVolumeClaim:
              claimName: cephfs-pvc
      updateStrategy:
        type: RollingUpdate

logs on kubernetes nodes:

    [root@k8s-3 01C48JAGH1QCGKGCG72E0B2Y8R]# journalctl -xeu kubelet --no-pager
    1月 20 11:21:54 k8s-3 kubelet[14306]: I0120 11:21:54.619924   14306 kuberuntime_manager.go:749] Back-off 5m0s restarting failed container=prometheus pod=prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)
    1月 20 11:21:54 k8s-3 kubelet[14306]: E0120 11:21:54.620042   14306 pod_workers.go:182] Error syncing pod 7598959a-fcff-11e7-9333-fa163e48f857 ("prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"), skipping: failed to "StartContainer" for "prometheus" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=prometheus pod=prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"
    1月 20 11:22:08 k8s-3 kubelet[14306]: I0120 11:22:08.615438   14306 kuberuntime_manager.go:500] Container {Name:prometheus Image:quay.io/prometheus/prometheus:v2.0.0 Command:[] Args:[--config.file=/etc/prometheus/config/prometheus.yaml --storage.tsdb.path=/cephfs/prometheus/data --storage.tsdb.retention=180d --web.route-prefix=/ --web.enable-lifecycle --web.enable-admin-api] WorkingDir: Ports:[{Name:web HostPort:0 ContainerPort:9090 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[cpu:{i:{value:500 scale:-3} d:{Dec:<nil>} s:500m Format:DecimalSI} memory:{i:{value:524288000 scale:0} d:{Dec:<nil>} s:500Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:209715200 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:config ReadOnly:false MountPath:/etc/prometheus/config SubPath: MountPropagation:<nil>} {Name:rules ReadOnly:false MountPath:/etc/prometheus/rules SubPath: MountPropagation:<nil>} {Name:data ReadOnly:false MountPath:/cephfs/prometheus/data SubPath:prometheus-data MountPropagation:<nil>} {Name:prometheus-k8s-token-x8xzh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:web,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:3,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:10,} ReadinessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:web,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:0,TimeoutSeconds:3,PeriodSeconds:5,SuccessThreshold:1,FailureThreshold:6,} Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    1月 20 11:22:08 k8s-3 kubelet[14306]: I0120 11:22:08.615662   14306 kuberuntime_manager.go:739] checking backoff for container "prometheus" in pod "prometheus-k8s-0_monitoring(7598959a-fcff-11e7-9333-fa163e48f857)"

Any suggestions? Thanks.

Upvotes: 0

Views: 2187

Answers (1)

brian-brazil
brian-brazil

Reputation: 34122

Two Prometheus servers cannot share the same storage directory, you should have gotten a locking error about this.

Upvotes: 1

Related Questions