Ashton
Ashton

Reputation: 1365

BigQuery Avro load job with useAvroLogicalTypes incorrect field types

I'm using node.js to:

My problem is that even if I set useAvroLogicalTypes when I create the load job, my date data is never correctly created as TIMESTAMP in BQ, always INTEGER - which should be the normal behavior when useAvroLogicalTypes is NOT set.

According to the same docs, if timestamp-millis is set in the Avro schema definition, it should load as TIMESTAMP.

My configuration:

Avro schema

{
      "name": "metadata",
      "type": {
        "name": "metadata",
        "type": "record",
        "fields": [
          {
            "name": "creationTime",
            "type": "long",
            "logicalType": "timestamp-millis"
          },
          {
            "name": "lastActivity",
            "type": ["null", "long"],
            "logicalType": "timestamp-millis"
          },
          {
            "name": "deletionTime",
            "type": ["null", "long"],
            "logicalType": "timestamp-millis"
          },
          {
            "name": "lastSignInTime",
            "type": ["null", "long"],
            "logicalType": "timestamp-millis"
          }
        ]
      }
    }

Create load job

const metadata = {
  sourceFormat: 'AVRO',
  useAvroLogicalTypes: true
}

The only way I could get it work is by specifying the scema again in the metadata var for the load job. My question is, isn't the whole point of specifying the Avro schema and turning on useAvroLogicalTypes to avoid having to explicitly spec the schema again the the load job metadata?

Working setup

const metadata = {
    sourceFormat: 'AVRO',
    useAvroLogicalTypes: true,
    schema: {
      fields: [
        ...otherFields,
        {
          name: 'metadata',
          type: 'STRUCT',
          mode: 'REQUIRED',
          fields: [
            { name: 'creationTime', type: 'TIMESTAMP', mode: 'REQUIRED' },
            { name: 'lastActivity', type: 'TIMESTAMP', mode: 'NULLABLE' },
            { name: 'deletionTime', type: 'TIMESTAMP', mode: 'NULLABLE' },
            { name: 'lastSignInTime', type: 'TIMESTAMP', mode: 'NULLABLE' }
          ]
        }
      ]
    }
  }

Upvotes: 1

Views: 765

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4051

Answering your question about specifying the Avro schema during creating the load job. It is possible to omit schema property, which is the schema for the destination table:

The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.

If you want to read more about it, please refer to the documentation.

In addition, you can also take a look to the --autodetect flag which replaces the schema definition.

I hope it will help you.

Upvotes: 2

Related Questions