Reputation: 1
I am trying to run a job from my nodeJS server, which loads an AVRO file on Cloud Storage into a BigQuery table. The job works perfectly, however the date
column is loaded as type INTEGER
in the table. I included the useAvroLogicalTypes
param in the load job but it doesn't seem to have any effect.
If I cast the date
column in the table using a SELECT DATE(TIMESTAMP_MILLIS(date))
I do get the correct date, but hoping to avoid this extra transformation step. I have read everywhere that the avro logical types can be cast implicitly if the parameter is set, but I have been unable to get it to work. The table was created by the job, so there was no pre-existing schema.
Client library versions I am using are:
4.4.0 for @google-cloud/bigquery
4.1.2 for @google-cloud/storage
.
AVRO SCHEMA:
const schema = {
"name": "root",
"type": "record",
"fields": [
{ "name": "date", "type": ["null", { "type": "long", "logicalType": "date" }]},
{ "name": "medium", "type": ["null", "string"] },
{ "name": "source", "type": ["null", "string"] },
{ "name": "campaign", "type": ["null", "string"] },
]
};
JOB CODE
const options = {
sourceFormat: 'AVRO',
writeDisposition: 'WRITE_TRUNCATE',
useAvroLogicalTypes: true,
datasetID,
};
bigquery
.dataset(datasetID)
.table(tableID)
.load(storage.bucket(bucketName).file(fileName), options)
.then(results => {
res = results[0];
// load() waits for the job to finish
console.log(`Job ${res.id} completed.`);
// Check the job's status for errors
const errors = res.status.errors;
if (errors && errors.length > 0) {
E = errors;
}
// This kicks the execution back to where the Fiber.yield() statement stopped it
fiber.resume();
})
.catch(err => {
console.error('ERROR:', err);
});
Sample Raw data:
data = [
{"date":"2019-08-01","medium":"(none)","source":"(direct)","campaign":"(not set)","users":3053},
{"date":"2019-08-01","medium":"(not set)","source":"email-client","campaign":"(not set)","users":3},
{"date":"2019-08-01","medium":"affiliate","source":"sdn","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"corner","campaign":"onboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"custom-playlist","campaign":"fonboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"deref-mail.com","campaign":"(not set)","users":2},
{"date":"2019-08-01","medium":"email","source":"faketempmail","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"fundx","campaign":"email_campaign","users":1},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"(not set)","users":14},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"100k","users":2},
]
I convert the date property to long
using momentJS and underscoreJS and a simple map function:
data = _.map(data, row => {
row.date = moment(row.date).isValid() ? +moment(row.date).valueOf() : null;
return row;
});
Upvotes: 0
Views: 1000
Reputation: 2883
The way you mention would be the "default" way as the Avro logical type date
will be stored as INTEGER
in BigQuery when you're not using useAvroLogicalTypes
.
This also depend on how your AVRO schema was made. For instance, I had to build an AVRO file with the schema
"fields":[
{"logicaltype": "date", "type": "string", "name": "field1"}
]
and I uploaded my DATE data correctly by using the following code:
const metadata = {
sourceFormat: 'AVRO',
useAvroLogicalTypes: true,
createDisposition: 'CREATE_IF_NEEDED',
writeDisposition: 'WRITE_TRUNCATE',
schema: {
fields: [
{
name: "field1",
type: "DATE",
logicalType: "STRING",
mode: "NULLABLE"
}
],
},
location: 'US',
};
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename), metadata);
Based on the data that you shared, it should work with this configuration as your date
data is String
.
Hope it helps.
Upvotes: 0