Anup
Anup

Reputation: 91

failed to flush the buffer in fluentd looging

I am getting these errors during ES logging using fluentd. I'm using fluentd logging on k8s for application logging, we are handling 100M (around 400 tps) and getting this issue. I'm using M6g.2xlarge(8 core and 32 RAM) AWS instances 3 master and 20 data nodes. Under 200 tps everything is working fine after 200 getting these issue. There is a lag in Kibana and data loss on ES.

ES version: 7.15.0
fluentd version: 1.12.4

My logging flow: Fluentd > ES > Kibana

Error Log:

2022-02-04 00:37:53 +0530 [warn]: #0 failed to flush the buffer. retry_time=3 next_retry_seconds=2022-02-04 00:37:56 +0530 chunk="5d721d8e59e44f5bbbf4aa5e267f7e3e" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"eslogging-prod.abc.com\", :port=>80, :scheme=>\"http\"}): [429] {\"error\":{\"root_cause\":[{\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"}],\"type\":\"es_rejected_execution_exception\",\"reason\":\"rejected execution of coordinating operation [coordinating_and_primary_bytes=1621708431, replica_bytes=0, all_bytes=1621708431, coordinating_operation_bytes=48222318, max_coordinating_and_primary_bytes=1655072358]\"},\"status\":429}

Config file:

<source>
    @type tail
    @id in_tail_container_logs
    path /var/log/containers/*.log
    pos_file /var/log/fluentd-containers.log.pos
    tag kubernetes.*
    read_from_head true
    <parse>
        @type json
        time_key @timestamp
        time_format %Y-%m-%dT%H:%M:%S.%N%z
        keep_time_key true
    </parse>
</source>

<filter kubernetes.**>
    @type kubernetes_metadata
    skip_container_metadata "true"
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @id filter_concat
    @type concat
    key log
    use_first_timestamp true
    multiline_end_regexp /\n$/
    separator ""
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @type record_transformer
    <record>
        log_json ${record["log"]}
    </record>
    remove_keys $.kubernetes.pod_id,$.kubernetes.container_image
</filter>

<filter kubernetes.var.log.containers.**prod**>
    @type parser
    @log_level debug
    key_name log_json
    #reserve_time true
    reserve_data true
    remove_key_name_field true
    emit_invalid_record_to_error true
    <parse>
        @type json
    </parse>
</filter>    

<match kubernetes.var.log.containers.**prod**>
    @type elasticsearch
    @log_level info
    include_tag_key true
    suppress_type_name true
    host "eslogging-prod.abc.com"
    port 80
    reload_connections false
    logstash_format true
    logstash_prefix ${$.kubernetes.labels.app}
    reconnect_on_error true
    num_threads 8
    request_timeout 2147483648
    compression_level best_compression
    compression gzip
    include_timestamp true
    utc_index false
    time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
    time_key time
    reload_on_failure true
    prefer_oj_serializer true
    bulk_message_request_threshold -1
    slow_flush_log_threshold 30.0
    log_es_400_reason true
    <buffer tag, $.kubernetes.labels.app>
        @type file
        path /var/log/fluentd-buffers/kubernetes-apps.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 8
        flush_interval 5s
        retry_forever true 
        retry_max_interval 30
        chunk_limit_size 200M
        queue_limit_length 512
        overflow_action throw_exception           
    </buffer>
</match>

Upvotes: 0

Views: 10986

Answers (1)

Karsten Schnitter
Karsten Schnitter

Reputation: 301

You are encountering an index rate limiting by Elasticsearch. Have a look at this article for hints: https://chenriang.me/elasticsearch-bulk-insert-rejection.html I suggest to try to reduce the chunk size in Fluentd and increase the retries.

Upvotes: 0

Related Questions