gregp
gregp

Reputation: 61

Validating PubSub message against AVRO JSON schema with multiple union types

I'm having trouble publishing messages to a new pubsub topic related to the AVRO schema. I publish a message from PHP using the Google\Cloud\PubSub\PubSubClient library and I get an error:

{
  "error": {
    "code": 400,
    "message": "Invalid data in message: Message failed schema validation.",
    "status": "INVALID_ARGUMENT",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "INVALID_JSON_AVRO_MESSAGE",
        "domain": "pubsub.googleapis.com",
        "metadata": {
          "message": "Message failed schema validation",
          "revisionInfo": "Could not validate message with any schema revision for schema: projects/foo-project/schemas/foo-schema, last checked revision: revision_id=foo-revision-id failed with status: Invalid data in message: JSON object with type string does not match schema which expected object."
        }
      }
    ]
  }
}

I tried to validate my message in Google Cloud Console https://console.cloud.google.com/cloudpubsub/schema/detail/foo-schema?project=foo-project using UI Test message, but all combinations return error: Invalid JSON -encoded message against Avro schema. without any details.

Adding optional fields with null value doesn't work, wrapping action_type inside action field doesn't help. Adding the nested "name": null inside account object doesn't help either, nor does any combination of the above. I'm quite desperate now.

Interesting fact - According to avro_validator, the message has the correct format.

This is my example message:

{
    "action": "create",
    "url": "https://my-api.com/resource/new_resource_name",
    "operation": "created",
    "callback_url": "https://my-another-api/com/resource/new_resource_name",
    "name": "new_resource_name",
    "source": "service_name",
    "account": {"number": 2830602},
    "operation_metadata": "{\"created_on\":\"2024-06-24T08:47:14+00:00\"}"
}

This is the schema I've created in GCP:

{
  "fields": [
    {
      "name": "action",
      "type": [
        "null",
        {
          "name": "action_type",
          "symbols": [
            "create",
            "another_action_type",
            "another_action_type2",
            "another_action_type3"
          ],
          "type": "enum"
        }
      ]
    },
    {
      "name": "url",
      "type": "string"
    },
    {
      "name": "operation",
      "type": {
        "name": "operation_type",
        "symbols": [
          "created",
          "another_operation_type",
          "another_operation_type2",
          "another_operation_type3"
        ],
        "type": "enum"
      }
    },
    {
      "name": "callback_url",
      "type": "string"
    },
    {
      "name": "name",
      "type": "string"
    },
    {
      "default": "default_service_name",
      "name": "source",
      "type": {
        "name": "source_service",
        "symbols": [
          "service_name1",
          "service_name2"
        ],
        "type": "enum"
      }
    },
    {
      "default": null,
      "name": "homepage_url",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "account",
      "type": [
        "null",
        {
          "fields": [
            {
              "default": null,
              "name": "number",
              "type": [
                "null",
                "int"
              ]
            },
            {
              "default": null,
              "name": "name",
              "type": [
                "null",
                "string"
              ]
            }
          ],
          "name": "account_record",
          "type": "record"
        }
      ]
    },
    {
      "default": null,
      "name": "cluster",
      "type": [
        "null",
        {
          "fields": [
            {
              "default": null,
              "name": "number",
              "type": [
                "null",
                "int"
              ]
            }
          ],
          "name": "cluster_record",
          "type": "record"
        }
      ]
    },
    {
      "default": null,
      "name": "type",
      "type": [
        "null",
        {
          "name": "environment_type",
          "symbols": [
            "DEVELOPMENT",
            "STAGING",
            "PRODUCTION"
          ],
          "type": "enum"
        }
      ]
    },
    {
      "default": null,
      "name": "error",
      "type": [
        "null",
        "string"
      ]
    },
    {
      "default": null,
      "name": "operation_metadata",
      "type": [
        "null",
        "string"
      ]
    }
  ],
  "name": "MyFooEvents",
  "type": "record"
}

If anyone has an idea, please give me a hint.

Upvotes: 0

Views: 818

Answers (1)

Kamal Aboul-Hosn
Kamal Aboul-Hosn

Reputation: 17261

The message has several issues:

  1. It does not conform to the JSON encoding rules for Avro messages. When encoding unions, you must provide the type as a nested object.
  2. "service_name" is not a valid enum value for the "source" field.
  3. Several fields are missing. Even when nullable, they must be present in JSON.

Here is a valid version of the message:

{
    "action": {
      "action_type": "create"
    },
    "url": "https://my-api.com/resource/new_resource_name",
    "operation": "created",
    "callback_url": "https://my-another-api/com/resource/new_resource_name",
    "homepage_url": null,
    "name": "new_resource_name",
    "source": "service_name1",
    "account": {
      "account_record": {
        "number": {
          "int": 2830602
        },
        "name": null
      }
    },
    "cluster": null,
    "type": null,
    "operation_metadata": {
      "string": "{\"created_on\":\"2024-06-24T08:47:14+00:00\"}"
    },
    "error": null
}

Upvotes: 2

Related Questions