Reputation: 8521
Is it possible to have an optional field in an Avro schema (i.e. the field does not appear at all in the .JSON file)?
In my Avro schema, I have two fields:
{"name": "author", "type": ["null", "string"], "default": null},
{"name": "importance", "type": ["null", "string"], "default": null},
And in my JSON files those two fields can exist or not.
However, when they do not exist, I receive an error (e.g. when I test such a JSON file using avro-tools command line client):
Expected field name not found: author
I understand that as long as the field name exists in a JSON, it can be null
, or a string
value, but what I'm trying to express is something like "this JSON is valid if the those field names do not exist, OR if they exist and they are null or string".
Is this possible to express in an Avro schema? If so, how?
Upvotes: 48
Views: 65428
Reputation: 19
Using the default attribute with null
value or union type [null, orignal_type]
.
undefined
is not supported -> docsIn case of object it should look like this:
const avro = require('avsc');
const yourSchema = avro.Type.forSchema({
type: 'record',
name: 'parent_record',
fields: [
{ name: 'field_1', type: ['null', 'string'], default: null },
{
name: 'optional_object_type',
type: ['null', {
type: 'record',
name: 'optional_record',
fields: [{ name: 'sub_field', type: 'string' }]
}],
default:null
}
]
});
Upvotes: 1
Reputation: 56
Are you providing a the type ("null" or "string") as a key in the object to be serialized, or just trying to serialize a bare object?
Avro implements tagged unions and will not perform type inference to decide which type an object is. This means that you have to provide a type tag.
I am testing with Node and avro-js. The following works:
const avro = require( "avro-js" );
const schema = {
type: "record", name: "test", fields: [
{
"name": "author", "type": ["null", "string"],
"default": null
},
{
"name": "importance", "type": ["null", "string"],
"default": null
},
]
};
const s = avro.parse( schema );
s.toBuffer( {
author: { null: null },
importance: { null: null }
} ).toString();
// '\x00\x00'
s.toBuffer( {
author: { string: 'Homer' },
importance: { string: '1' }
} ).toString();
// '\x02\nHomer\x02\x021'
I find that I can serialize an empty object because default values are provided:
s.toBuffer( {} ).toString();
// '\x00\x00'
However, this may be implementation-specific. Can you provide reproduction instructions so we can help further?
Upvotes: 1
Reputation: 1194
you can define the default attribute as undefined example. so the field can be skipped.
{
"name": "first_name",
"type": "string",
"default": "undefined"
},
Also all field are manadatory in avro. if you want it to be optional, then union its type with null. example:
{
"name": "username",
"type": [
"null",
"string"
],
"default": null
},
Upvotes: 49
Reputation: 1411
According to avro specification this is possible, using the default attribute.
See https://avro.apache.org/docs/1.8.2/spec.html
default: A default value for this field, used when reading instances that lack this field (optional). Permitted values depend on the field's schema type, according to the table below. Default values for union fields correspond to the first schema in the union.
At the example you gave, you do add the default attribute with value "null", so this should work. However, supporting this depends also on the library you use for reading the avro message (there are libraries at c,c++,python,java,c#,ruby etc.). Maybe (probably) the library you use lack this feature.
Upvotes: 15