A.M
A.M

Reputation: 1

BigQuery: NodeJS client library does not seem to respect useAvroLogicalTypes when performing a load job from Cloud Storage

I am trying to run a job from my nodeJS server, which loads an AVRO file on Cloud Storage into a BigQuery table. The job works perfectly, however the date column is loaded as type INTEGER in the table. I included the useAvroLogicalTypes param in the load job but it doesn't seem to have any effect.

If I cast the date column in the table using a SELECT DATE(TIMESTAMP_MILLIS(date)) I do get the correct date, but hoping to avoid this extra transformation step. I have read everywhere that the avro logical types can be cast implicitly if the parameter is set, but I have been unable to get it to work. The table was created by the job, so there was no pre-existing schema.

Client library versions I am using are: 4.4.0 for @google-cloud/bigquery 4.1.2 for @google-cloud/storage.

AVRO SCHEMA:

const schema = {
    "name": "root",
    "type": "record",
    "fields": [
      { "name": "date", "type": ["null", { "type": "long", "logicalType": "date" }]},
      { "name": "medium", "type": ["null", "string"] },
      { "name": "source", "type": ["null", "string"] },
      { "name": "campaign", "type": ["null", "string"] },
    ]
  };

JOB CODE

const options = {
    sourceFormat: 'AVRO',
    writeDisposition: 'WRITE_TRUNCATE',
    useAvroLogicalTypes: true,
    datasetID,
  };

bigquery
    .dataset(datasetID)
    .table(tableID)
    .load(storage.bucket(bucketName).file(fileName), options)
    .then(results => {

      res = results[0];

      // load() waits for the job to finish
      console.log(`Job ${res.id} completed.`);

      // Check the job's status for errors
      const errors = res.status.errors;

      if (errors && errors.length > 0) {
        E = errors;
      }
      // This kicks the execution back to where the Fiber.yield() statement stopped it
      fiber.resume();
    })
    .catch(err => {
      console.error('ERROR:', err);
    });

Sample Raw data:

data = [
{"date":"2019-08-01","medium":"(none)","source":"(direct)","campaign":"(not set)","users":3053},
{"date":"2019-08-01","medium":"(not set)","source":"email-client","campaign":"(not set)","users":3},
{"date":"2019-08-01","medium":"affiliate","source":"sdn","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"corner","campaign":"onboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"custom-playlist","campaign":"fonboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"deref-mail.com","campaign":"(not set)","users":2},
{"date":"2019-08-01","medium":"email","source":"faketempmail","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"fundx","campaign":"email_campaign","users":1},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"(not set)","users":14},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"100k","users":2},
]

I convert the date property to long using momentJS and underscoreJS and a simple map function:

data = _.map(data, row => {
    row.date = moment(row.date).isValid() ? +moment(row.date).valueOf() : null;
    return row;
  });

Upvotes: 0

Views: 1000

Answers (1)

F10
F10

Reputation: 2883

The way you mention would be the "default" way as the Avro logical type date will be stored as INTEGER in BigQuery when you're not using useAvroLogicalTypes.

This also depend on how your AVRO schema was made. For instance, I had to build an AVRO file with the schema

"fields":[
    {"logicaltype": "date", "type": "string", "name": "field1"}
]

and I uploaded my DATE data correctly by using the following code:

const metadata = {
      sourceFormat: 'AVRO',
      useAvroLogicalTypes: true,
      createDisposition: 'CREATE_IF_NEEDED',
      writeDisposition: 'WRITE_TRUNCATE',
      schema: {
        fields: [
          {
            name: "field1",
            type: "DATE",
            logicalType: "STRING",
            mode: "NULLABLE"
          }
        ],
      },
      location: 'US',
    };
    const [job] = await bigquery
      .dataset(datasetId)
      .table(tableId)
      .load(storage.bucket(bucketName).file(filename), metadata);

Based on the data that you shared, it should work with this configuration as your date data is String.

Hope it helps.

Upvotes: 0

Related Questions