Ohad
Ohad

Reputation: 343

Flatten nested JSON using fluentd

I have a program that writes structured logs, and the following example applies:

{
    "time": "time_val",
    "log": "{
        \"field1\": \"value1\",
        \"field2\": \"value2\",
        \"field3\": \"{
            \"nested_field1\": \"value1\",
            \"nested_field2\": \"value2\",
            \"nested_field3\": \"value3\"
        }\"
    }"
}

I am using fluentd to tail the output of the container, and parse JSON messages, however, I would like to parse the nested structured logs, so they are flattened in the original message. For the example, I would want fluentd to eventually consider the message as:

{
    "time": "time_val",
    "field1": "value1",
    "field2": "value2",
    "nested_field1": "value1",
    "nested_field2": "value2",
    "nested_field3": "value3"
}

Is this something that can be done using fluentd configuration? Changing the original program behavior is not an option in my case.

Upvotes: 3

Views: 1370

Answers (1)

Azeem
Azeem

Reputation: 14637

You can use the parser filter plugin with its key_name, reserve_data, and remove_key_name_field.

Example:

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

Here is the complete working example after making your JSON valid i.e.:

{"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}

fluent-flatten-json.conf

<source>
  @type forward
</source>

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

<match **>
  @type stdout
</match>

Run fluentd:

fluentd -c ./fluent-flatten-json.conf

From another terminal, run fluent-cat with input JSON:

fluent-cat test <<< '{"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}'

Output in fluentd logs:

{"field1":"value1","field2":"value2","nested_field1":"value1","nested_field2":"value2","nested_field3":"value3"}

Formatted output:

{
  "field1": "value1",
  "field2": "value2",
  "nested_field1": "value1",
  "nested_field2": "value2",
  "nested_field3": "value3"
}

UPDATE

For a double-nested valid raw escaped JSON:

{"time":"time_val","log":"{\"field1\":\"value1\",\"field2\":\"value2\",\"field3\":\"{\\\"nested_field1\\\":\\\"nested_value1\\\",\\\"nested_field2\\\":\\\"nested_value2\\\",\\\"nested_field3\\\":\\\"nested_value3\\\"}\"}"}

The double-nested JSON in the question is not valid. I had to recreate it. See here.

The following should work:

<filter **>
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

Upvotes: 3

Related Questions