Alex Punnen
Alex Punnen

Reputation: 6244

Fluentd - How to parse logs whose messages are JSON formatted parsed AND whose messages are in text; as is without getting lost due to parse error

I have certain log messages from certain services that are in JSON format; and then this fluentd filter is able to parse that properly. However with this; it discards all other logs from other components whose message field is not proper JSON.

 <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      #https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-752813739
      #<parse>
      #  @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
      #  time_format %Y-%m-%dT%H:%M:%S.%NZ
      #</parse>
      #https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-831801690
      <parse>
        @type cri
        <parse> # this will parse the neseted feilds properly - like message in JSON; but if mesage is not in json then this is lost
          @type json 
        </parse>
      </parse>
      #emit_invalid_record_to_error # when nested logging fails, see if we can parse via JSON
      #tag backend.application
    </source>

enter image description here

But all other messages which do not have proper JSON format are lost;

If I comment out the nested parse part inside type cri; then I get all logs; but logs whose messages are in JSON format are not parsed further. Espcially severity field.See last two lines in the screen shot below

  <parse>
        @type cri
   </parse>

enter image description here

To overcome this ; I try to use the LABEL @ERROR, if nested parsing fails for some logs; whose message is not in JSON format- I need to still see the pod name and other details and message as text in Kibana; However with the below config, it is only able to parse logs whose message is proper JSON format

    <source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      #https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-752813739
      #<parse>
      #  @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
      #  time_format %Y-%m-%dT%H:%M:%S.%NZ
      #</parse>
      #https://github.com/fluent/fluentd-kubernetes-daemonset/issues/434#issuecomment-831801690
      <parse>
        @type cri
        <parse> # this will parse the neseted feilds properly - like message in JSON; but if mesage is not in json then this is lost
          @type json 
        </parse>
      </parse>
      #emit_invalid_record_to_error # when nested logging fails, see if we can parse via JSON
      #tag backend.application
    </source>


    <label @ERROR> # when nested logs fail this is not working
      <filter **>
        @type parser
        key_name message
        <parse>
          @type none
        </parse>
      </filter>
      <match kubernetes.var.log.containers.elasticsearch-kibana-**> #ignore from this container
        @type null
      </match>
    </label>

enter image description here

How do I get logs whose messages are JSON formatted parsed; and whose messages are in text; as is without getting lost ?

Config here (last there commits) https://github.com/alexcpn/grpc_templates.git

Upvotes: 4

Views: 6850

Answers (1)

Al-waleed Shihadeh
Al-waleed Shihadeh

Reputation: 2855

One way to solve this issue is to prepare the logs before parsing them with cir plugin, to do so you need to perform the following steps

  • collect container logs and tag them with a given tag.
  • classify the logs to JSON and none JSON logs using rewrite_tag_filter. and regex.
  • parse JSON logs with cri
  • parse none JSON Logs

example of configs (not tested)

## collect row logs from files 
<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
  read_from_head true
  format json
</source>

# add metadata to the records (container_name, image etc..)
<filter kubernetes.**>
  @type kubernetes_metadata
</filter>

# classify the logs to different categories 
<match kubernetes.**>
 @type rewrite_tag_filter
 <rule>
   key message
   pattern /^\{.+\}$/
   tag json.${tag}
 </rule>
 <rule>
   key message
   pattern /^\{.+\}$/
   tag nonejson.${tag}
   invert true
 </rule>
</match>

# filter or match logs that match the json tag
<filter json.**>
</filter>
<match json.**>
</match>

# filter or match logs that match the none json tag
<filter nonejson.**>
</filter>
<match nonejson.**>
</match>

Upvotes: 2

Related Questions