night-gold
night-gold

Reputation: 2441

eks fluent-bit to elasticsearch timeout

So I had a working configuration with fluent-bit on eks and elasticsearch on AWS that was pointing on the AWS elasticsearch service but for cost saving purpose, we deleted that elasticsearch and created an instance with a solo elasticsearch, enough for dev purpose. And the aws service doesn't manage well with only one instance.

The issue is that during this migration the fluent-bit seems to have broken, and I get lots of "[warn] failed to flush chunk" and some "[error] [upstream] connection #55 to ES-SERVER:9200 timed out after 10 seconds".

My current configuration:

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     kube.var.log.containers.
    Merge_Log           On
    Merge_Log_Key       log_processed
    K8S-Logging.Parser  On
    K8S-Logging.Exclude Off
[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            docker
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     50MB
    Skip_Long_Lines   On
    Refresh_Interval  10
    Ignore_Older      1m

I think the issue is in one of those configuration, if I comment the kubernetes filter I don't have the errors anymore but I'm loosing the fields in the indices...

I tried tweeking some parameters in fluent-bit to no avail, if anyone has a suggestion?

So, the previous logs did not indicate anything, but I finaly found something when activating trace_error in the elasticsearch output:

{"index":{"_index":"fluent-bit-2021.04.16","_type":"_doc","_id":"Xkxy     23gBidvuDr8mzw8W","status":400,"error":{"type":"mapper_parsing_exception","reas     on":"object mapping for [kubernetes.labels.app] tried to parse field [app] as o     bject, but found a concrete value"}}

Did someone get that error before and knows how to solve it?

Upvotes: 0

Views: 3850

Answers (1)

night-gold
night-gold

Reputation: 2441

So, after looking into the logs and finding the mapping issue I ssem to have resolved the issue. The logs are now corretly parsed and send to the elasticsearch.

To resolve it I had to augment the limit of output retry and add the Replace_Dots option.

[OUTPUT]
    Name            es
    Match           *
    Host            ELASTICSERVER
    Port            9200
    Index           <fluent-bit-{now/d}>
    Retry_Limit     20
    Replace_Dots    On

It seems that at the beginning I had issues with the content being sent, because of that the error seemed to have continued after the changed until a new index was created making me think that the error was still not resolved.

Upvotes: 2

Related Questions