How to write data in Avro with the C++ interface when the field is nullable?

Question

First, I conducted a search for this question. I found an answer for the C interface and one for Java. Didn't find one for C++. Unfortunately the methods invoked in the C example don't exist in the C++ API, so one couldn't merely mimic the answer provided in that particular stackoverflow discussion/topic.

I am attempting something that should be rather simple. Yet after an hour or two I have only managed to get closer to an answer and still haven't found one yet. In the interest of simplicity, I reduced the record that I am attempting to write to only 1 field. That field is a string that can be null. In Avro this means that the field is optional. The null aspect of the field is accomplished through an Avro union, where the convention is that the null value comes first in the schema for that field.

What I've learned thus far from a considerable amount of trial and error:

You need an encoder and decoder within a templated codec_traits struct for the record you want to write. This is typically defined in a header somewhere.
If loading the schema from a file, which I am doing, then you need that schema defined in JSON format in a separate file.
In your C++ code, you declare an avro::DataFileWriter using the schema that you load, along with a record from the aforementioned header. You then have a local record that you populate with your data and then you invoke the write() method.

Should be simple enough. Yet not so much. For the particulars per the above list, the following comprise the code that I am currently using:

The header:

    #ifndef RECURSIVE_HH
    #define RECURSIVE_HH
    
    #include "Specific.hh"
    #include "Encoder.hh"
    #include "Decoder.hh"
    
    namespace recursive_record
    {
       struct recursive_data
       {
          std::string   fstring;
    
       };
    }
    
    namespace avro
    {
       template<> struct codec_traits
       {
          static void encode( Encoder& e, const recursive_record::recursive_data& v )
          {
             avro::encode( e, v.fstring );
    
          }
    
          static void decode( Decoder& d, recursive_record::recursive_data& v )
          {
             avro::decode( d, v.fstring );
    
          }
       };
    }
    
    #endif /* RECURSIVE_HH */

The JSON schema file:

    {
        "type": "record",
        "name": "Root",
        "fields": [
            {
                "name": "fstring",
                "type": [
                    "null",
                    "string"
                ]
            }
        ]
    }

The main C++ file (note that I have snipped the file for brevity reasons, thus some of the included headers aren't used (or rather seen) in the following code:

    #include "recursive.h"
    #include "Encoder.hh"
    #include "Decoder.hh"
    #include "Generic.hh"
    #include "GenericDatum.hh"
    #include "ValidSchema.hh"
    #include "DataFile.hh"
    #include "Types.hh"
    #include "Compiler.hh"
    #include "Stream.hh"
    
    avro::ValidSchema loadSchema(const char* filename)
    {
        std::ifstream ifs(filename);
        avro::ValidSchema result;
        avro::compileJsonSchema(ifs, result);
        return result;
    }
    
    
    int main( int argc, char** argv )
    {
       /**********************************************************************************
                                  AVRO WRITER EXAMPLE
       **********************************************************************************/
       try
       {
          //Filename definitions skipped for brevity
    
          avro::ValidSchema          recursiveSchema = loadSchema( schemaFilename );
          avro::DataFileWriter   dfw( filename, recursiveSchema );
          recursive_record::recursive_data       record;
          record.fstring = std::string("First string");
    
          dfw.write( record );
          dfw.close();
    
       }
       catch( const std::exception& e )
       {
          // Log a message
          return -1;
    
       }
    }

"So what's the problem?" you might ask. Well, the file is actually written successfully, at least in that the code doesn't crash and an Avro data file is produced. So far, so good. However, if you attempt to read that file, then you receive the following error:

    AVRO read error: vector::_M_range_check: __n (which is 12) >= this->size() (which is 2)

Wha-??? Yeah. 'Been working on this all afternoon.

After considerable experimentation, I discovered that the problem was due to this nullable aspect of a given field. I also noticed that if I removed the nullable option from the schema, so that the schema becomes this:

    {
        "type": "record",
        "name": "Root",
        "fields": [
            {
                "name": "fstring",
                "type": "string"
            }
        ]
    }

And I change nothing else, then the new Avro data file is not only written successfully, but it is read successfully too, thus:

    [rh6lgn01][1881] MY_EXAMPLES/generate_recursive$ recursive
    schema=recursive.json
    file=./DATA/recursive.avro
    recursiveSchema valid = true
    ReadFile(): Type = record
    ProcessRecord(): New record found.  Field count = 1
    ProcessRecord(): {
    ProcessRecord():   Field 0: type = string
    ProcessDatum():   Field 0: value = First string (length= 12)
    ProcessRecord(): }
    rowCount = 1
    
    AVRO Writing and Reading Complete
    [rh6lgn01][1882] MY_EXAMPLES/generate_recursive$

I had some hope when I read the Java issue. There was one answer that noted that - in Java - there is a @Nullable tag that you can associate with a field in a record. Here is a link to that issue: Storing null values in avro files

There is of course no such mechanism in the C++ language. I did find in the Types.hh header the following line of code that somehow seemed related:

    /// define a type to identify Null in template functions
    struct AVRO_DECL Null { };

However I couldn't make heads-nor-tails of how to use it in similar fashion. So I'm either missing something or it has a different purpose. I fear the former but suspect the latter.

And this is a link to the stackoverflow C issue, along with its answer, for completion: Write nullable item to avro record in Avro C

I am using version 1.9.2 of the Avro C++ library, running on a GNU/Linux box (not that it should matter, but for completion).

I will continue to prod and seek an answer, but if anyone has done this previously and can shed some light, I would appreciate the feedback.

Thanks!

How to write data in Avro with the C++ interface when the field is nullable?

Answers (1)

Related Questions