Stefano
Stefano

Reputation: 114

array management for json schema in confluent schema registry

currently I have a Kafka topic on which I'm writing the following json messages:

{
  "messageType": "NEW",
  "timestamp": 1656582818024,
  "fieldId": 266,
  "number": 9835,
  "contains": [
    "56644630997",
    "06014134231",
    "06014134231"
  ]
}

Pleas note that I can write an arbitrary number of strings in "contains" list (one time I could have it empty or populated with nulls, other times I could have thousands of strings)

The message is then written in Parquet format on a S3 storage using Confluent's S3 connector.
This connector works fine reading the json schemas from the confluent schema registry.

My problem here is I don't understand how should I build the json schema for this particular message since I don't know how to manage arrays.

Here is my current tentative which it isn't working.

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title":  "my_messages",
  "type": "object",
  "properties": {
    "messageType": {
      "type": ["string", "null"]
    },
    "timestamp": {
      "type": ["integer", "null"]
    },
    "fieldId": {
      "type": ["integer", "null"]
    },
    "number": {
      "type": ["integer", "null"]
    },
    "contains": {
      "type": ["array", "null"],
      "items": [
        {
          "type": ["string", "null"]
        }
      ]
    }
  },
  "additionalProperties": true
}

I'm fairly sure my kafka connector is breaking because the "contains" array is not validated in my schema.json.

Following I post the error I'm getting: caused by: org.apache.kafka.connect.errors.DataException: Array schema did not specify the items type

Lastly, if you have a complete link to schema json documentation it would be great. Thank you for any help

Upvotes: 1

Views: 1185

Answers (1)

Ether
Ether

Reputation: 53976

If you want all elements of the array in "contains" to be validated, remove the extra nesting of that subschema in an array. That is, turn this:

    "contains": {
      "type": ["array", "null"],
      "items": [
        {
          "type": ["string", "null"]
        }
      ]
    }

into this:

    "contains": {
      "type": ["array", "null"],
      "items": {
        "type": ["string", "null"]
      }
    }

Documentation is available here: https://json-schema.org/understanding-json-schema/reference/array.html

Upvotes: 2

Related Questions