Reputation: 39005
I am add prometheus(prom/prometheus:v2.16.0) alertmanager,now I add rule config in prometheus-configmap.xml:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
rule_files:
- /etc/prometheus/rules.yml
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:9093"]
scrape_configs:
- job_name: traefik
metrics_path: /metrics
static_configs:
- targets:
- traefik.kube-system.svc.cluster.local:8080
rules.yml: |
groups:
- name: test-rule
rules:
- alert: NodeFilesystemUsage
expr: (node_filesystem_size{device="rootfs"} - node_filesystem_free{device="rootfs"}) / node_filesystem_size{device="rootfs"} * 100 > 80
for: 2m
labels:
team: node
annotations:
summary: "{{$labels.instance}}: High Filesystem usage detected"
description: "{{$labels.instance}}: Filesystem usage is above 80% (current value is: {{ $value }}"
and I refresh the config:
kubectl apply -f prometheus-configmap.xm
kubectl exec -it soa-room-service-686959b94d-9g5q2 /bin/bash
curl -X POST http://prometheus.kube-system.svc.cluster.local:9090/-/reload
the prometheus dashboard config shows like this:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scheme: http
timeout: 10s
api_version: v1
rule_files:
- /etc/prometheus/rules.yml
scrape_configs:
- job_name: traefik
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- traefik.kube-system.svc.cluster.local:8080
the alert config rules not valid,what should I do to make it works?
This is how to install prometheus :
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: kube-system
labels:
k8s-app: prometheus
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v2.2.1
spec:
serviceName: "prometheus"
replicas: 1
podManagementPolicy: "Parallel"
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
k8s-app: prometheus
template:
metadata:
labels:
k8s-app: prometheus
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: prometheus-data
mountPath: /data
subPath: ""
containers:
- name: prometheus-server-configmap-reload
image: "jimmidyson/configmap-reload:v0.1"
imagePullPolicy: "IfNotPresent"
args:
- --volume-dir=/etc/config
- --webhook-url=http://localhost:9090/-/reload
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
resources:
limits:
cpu: 10m
memory: 10Mi
requests:
cpu: 10m
memory: 10Mi
- name: prometheus-server
image: "prom/prometheus:v2.16.0"
imagePullPolicy: "IfNotPresent"
args:
- --config.file=/etc/config/prometheus.yml
- --storage.tsdb.path=/data
- --web.console.libraries=/etc/prometheus/console_libraries
- --web.console.templates=/etc/prometheus/consoles
- --web.enable-lifecycle
ports:
- containerPort: 9090
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
# based on 10 running nodes with 30 pods each
resources:
limits:
cpu: 200m
memory: 1000Mi
requests:
cpu: 200m
memory: 1000Mi
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: prometheus-data
mountPath: /data
subPath: ""
terminationGracePeriodSeconds: 300
volumes:
- name: config-volume
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "16Gi"
This is my pod describe output:
kubectl describe pods prometheus-0 -n kube-system
Name: prometheus-0
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: azshara-k8s01/172.19.104.231
Start Time: Wed, 11 Mar 2020 19:28:28 +0800
Labels: controller-revision-hash=prometheus-cf5dc9d8b
k8s-app=prometheus
statefulset.kubernetes.io/pod-name=prometheus-0
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 172.30.224.4
IPs: <none>
Controlled By: StatefulSet/prometheus
Init Containers:
init-chown-data:
Container ID: docker://a3adc4bce1dccbdd6adb27ca38c54b7ae670d605b6273d53e85f601649357709
Image: busybox:latest
Image ID: docker-pullable://busybox@sha256:b26cd013274a657b86e706210ddd5cc1f82f50155791199d29b9e86e935ce135
Port: <none>
Host Port: <none>
Command:
chown
-R
65534:65534
/data
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 11 Mar 2020 19:28:29 +0800
Finished: Wed, 11 Mar 2020 19:28:29 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-k8d22 (ro)
Containers:
prometheus-server-configmap-reload:
Container ID: docker://9d31d10c9246ddfa94d84d59737edd03f06e008960657b000461ae886d030516
Image: jimmidyson/configmap-reload:v0.1
Image ID: docker-pullable://jimmidyson/configmap-reload@sha256:2d40c2eaa6f435b2511d0cfc5f6c0a681eeb2eaa455a5d5ac25f88ce5139986e
Port: <none>
Host Port: <none>
Args:
--volume-dir=/etc/config
--webhook-url=http://localhost:9090/-/reload
State: Running
Started: Wed, 11 Mar 2020 19:28:30 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 10m
memory: 10Mi
Requests:
cpu: 10m
memory: 10Mi
Environment: <none>
Mounts:
/etc/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-k8d22 (ro)
prometheus-server:
Container ID: docker://65d2870debb187a20a102786cac3725745e5bc0d60f3e04cb38c2beea6f5c128
Image: prom/prometheus:v2.16.0
Image ID: docker-pullable://prom/prometheus@sha256:e4ca62c0d62f3e886e684806dfe9d4e0cda60d54986898173c1083856cfda0f4
Port: 9090/TCP
Host Port: 0/TCP
Args:
--config.file=/etc/config/prometheus.yml
--storage.tsdb.path=/data
--web.console.libraries=/etc/prometheus/console_libraries
--web.console.templates=/etc/prometheus/consoles
--web.enable-lifecycle
State: Running
Started: Wed, 11 Mar 2020 19:28:30 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 200m
memory: 1000Mi
Requests:
cpu: 200m
memory: 1000Mi
Liveness: http-get http://:9090/-/healthy delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get http://:9090/-/ready delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/data from prometheus-data (rw)
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from prometheus-token-k8d22 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
prometheus-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-data-prometheus-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-config
Optional: false
prometheus-token-k8d22:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-token-k8d22
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 360s
node.kubernetes.io/unreachable:NoExecute for 360s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 50m default-scheduler Successfully assigned kube-system/prometheus-0 to azshara-k8s01
Normal Pulled 50m kubelet, azshara-k8s01 Container image "busybox:latest" already present on machine
Normal Created 50m kubelet, azshara-k8s01 Created container init-chown-data
Normal Started 50m kubelet, azshara-k8s01 Started container init-chown-data
Normal Pulled 50m kubelet, azshara-k8s01 Container image "jimmidyson/configmap-reload:v0.1" already present on machine
Normal Created 50m kubelet, azshara-k8s01 Created container prometheus-server-configmap-reload
Normal Started 50m kubelet, azshara-k8s01 Started container prometheus-server-configmap-reload
Normal Pulled 50m kubelet, azshara-k8s01 Container image "prom/prometheus:v2.16.0" already present on machine
Normal Created 50m kubelet, azshara-k8s01 Created container prometheus-server
Normal Started 50m kubelet, azshara-k8s01 Started container prometheus-server
Upvotes: 1
Views: 5203
Reputation: 39005
The rule.yml file in the path /etc/config
,not in the /etc/prometheus
,so change the rules file read path,the rules path config like this:
rule_files:
- /etc/config/rules.yml
Upvotes: 1
Reputation: 454
if it is deployed using Prometheus-operator, then you need to create an prometheusrule object. once you create an prometheusrule object it will automatically pick the new alerts rule. below is the sample:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: service-prometheus
role: alert-rules
name: prometheus-service-rules
namespace: monitoring
spec:
groups:
- name: general.rules
rules:
- alert: TargetDown-serviceprom
annotations:
description: '{{ $value }}% of {{ $labels.job }} targets are down.'
summary: Targets are down
expr: 100 * (count(up == 0) BY (job) / count(up) BY (job)) > 10
for: 10m
labels:
severity: warning
- alert: DeadMansSwitch-serviceprom
annotations:
description: This is a DeadMansSwitch meant to ensure that the entire Alerting
pipeline is functional.
summary: Alerting DeadMansSwitch
expr: vector(1)
labels:
severity: none
Upvotes: 1
Reputation: 1847
You have some possible way of checking your configuration.
I am not familiar with your kubernates set-up, so I am not able to verify it for you. I hope my links will help
Upvotes: 5