Reputation: 91
I've been facing issues generating Avro files from a BigQuery dataset while trying to maintain a predefined schema. My goal is to export Avro files without any post-processing, ensuring the schema matches the desired structure.
For example, let's assume I run the following query to create a dataset with a schema that matches my expected Avro schema:
CREATE OR REPLACE TABLE my_project.my_dataset.sample_data (
name STRING NOT NULL,
details STRUCT<
age INT64 NOT NULL,
city STRING NOT NULL
> NOT NULL
)
AS (
SELECT
"Alice" AS name,
STRUCT(30 AS age, "Seattle" AS city) AS details
);
When exporting this dataset to Avro, BigQuery automatically modifies the schema, producing something like this:
{
"type": "record",
"name": "Root",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "details",
"type": {
"type": "record",
"name": "root.details",
"fields": [
{
"name": "age",
"type": "long"
},
{
"name": "city",
"type": "string"
}
]
}
}
]
}
The expected Avro schema for a file containing that data would be something similar to this:
{
"type": "record",
"name": "Person",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "details",
"type": {
"type": "record",
"name": "Details",
"fields": [
{
"name": "age",
"type": "long"
},
{
"name": "city",
"type": "string"
}
]
}
}
]
}
PROBLEM: When exporting to Avro, BigQuery automatically modifies the schema in two key ways. First, it wraps the entire dataset inside a top-level "Root" record, even if the original table structure does not include such a wrapper. Second, it prefixes nested records with "root.", resulting in names like "root.details" instead of preserving the expected struct name, such as "details".
QUESTION: Is there a way to prevent BigQuery from automatically adding the top-level "Root" record during Avro export and from renaming nested records with a "root." prefix instead of keeping their original names? Again, Our goal is to generate an Avro file that strictly follows our predefined schema without requiring post-processing.
Upvotes: 1
Views: 73
Reputation: 92
BigQuery's Avro export does automatically add a top-level "Root" record and prefixes nested records with a "root", and currently as per my understanding there is no direct way to prevent this.
If you want your requirements to be supported in BigQuery, you can feel free to raise a Feature Request mentioning your issue and the requirements.
Upvotes: 2