Reputation: 681
We have a GKE cluster with:
We have Stackdriver Monitoring and Logging activated.
On 2018-01-22, masters where upgraded by Google to version 1.7.11-gke.1.
After this upgrade, we have a lot of errors like these:
I 2018-01-25 11:35:23 +0000 [error]: Exception emitting record: No such file or directory @ sys_fail2 - (/var/log/fluentd-buffers/kubernetes.system.buffer..b5638802e3e04e72f.log, /var/log/fluentd-buffers/kubernetes.system.buffer..q5638802e3e04e72f.log)
I 2018-01-25 11:35:23 +0000 [warn]: emit transaction failed: error_class=Errno::ENOENT error="No such file or directory @ sys_fail2 - (/var/log/fluentd-buffers/kubernetes.system.buffer..b5638802e3e04e72f.log, /var/log/fluentd-buffers/kubernetes.system.buffer..q5638802e3e04e72f.log)" tag="docker"
I 2018-01-25 11:35:23 +0000 [warn]: suppressed same stacktrace
Those messages are flooding our logs ~ 25Gb of logs each day, and are generated by pods managed by a DaemonSet called fluentd-gcp-v2.0.9 .
We found that it's a bug fixed on 1.8 and backported to 1.7.12.
My questions are:
Thanks in advance.
Upvotes: 3
Views: 356
Reputation: 681
First of all, the answer to question 2.
As alternatives we could have:
To answer question 1:
We upgraded to 1.7.12 in a test environment. The process took 3 minutes. During this period of time, we could not edit our cluster nor access it with kubectl (as expected).
After the upgrade, we deleted all our pods called fluentd-gcp-* and the flood stopped instantly:
for pod in $(kubectl get pods -nkube-system | grep fluentd-gcp | awk '{print $1}'); do \
kubectl -nkube-system delete pod $pod; \
sleep 20; \
done;
Upvotes: 3