Reputation: 2508
I am writing a python program using the official avro library for python, version 1.8.2.
This is a simple schema to show my problem:
{
"type": "record",
"namespace": "com.example",
"name": "NameUnion",
"fields": [
{
"name": "name",
"type": [
{
"type": "record",
"namespace": "com.example",
"name": "FullName",
"fields": [
{
"name": "first",
"type": "string"
},
{
"name": "last",
"type": "string"
}
]
},
{
"type": "record",
"namespace": "com.example",
"name": "ConcatenatedFullName",
"fields": [
{
"name": "entireName",
"type": "string"
}
]
}
]
}
]
}
Possible datums for this schema would be {"name": {"first": "Hakuna", "last": "Matata"}}
and {"name": {"entireName": "Hakuna Matata"}}
.
However, this gives margin to ambiguity, as not always avro will be able to detect the right schema specified in the union. In this case, either datum will correspond to 1 and only 1 valid schema, but there might be a case where more than 1 schema in the union would be valid.
I wonder whether it would be possible to use a datum like {"name": {"FullName": {"first": "Hakuna", "last": "Matata"}}}
, where the specific union schema name is specified in the datum.
Is it possible? How to do it?
Upvotes: 2
Views: 698
Reputation: 2508
After researching a lot, I found out that the representation that contains information about the types is the Avro JSON encoding standard.
This unfortunately is not supported by either the official python library nor by fastavro, at the date I am writing this text.
Upvotes: 2