Passos
Passos

Reputation: 91

How to Prevent BigQuery from Adding a Top-Level 'Root' Record and Auto-Prefixing Nested Fields in Avro Export?

I've been facing issues generating Avro files from a BigQuery dataset while trying to maintain a predefined schema. My goal is to export Avro files without any post-processing, ensuring the schema matches the desired structure.

For example, let's assume I run the following query to create a dataset with a schema that matches my expected Avro schema:

CREATE OR REPLACE TABLE my_project.my_dataset.sample_data (
    name STRING NOT NULL,
    details STRUCT<
      age INT64 NOT NULL,
      city STRING NOT NULL
    > NOT NULL
)
AS (
  SELECT
      "Alice" AS name,
      STRUCT(30 AS age, "Seattle" AS city) AS details
);

When exporting this dataset to Avro, BigQuery automatically modifies the schema, producing something like this:

{
  "type": "record",
  "name": "Root",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "details",
      "type": {
        "type": "record",
        "name": "root.details",
        "fields": [
          {
            "name": "age",
            "type": "long"
          },
          {
            "name": "city",
            "type": "string"
          }
        ]
      }
    }
  ]
}

The expected Avro schema for a file containing that data would be something similar to this:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "details",
      "type": {
        "type": "record",
        "name": "Details",
        "fields": [
          {
            "name": "age",
            "type": "long"
          },
          {
            "name": "city",
            "type": "string"
          }
        ]
      }
    }
  ]
}

PROBLEM: When exporting to Avro, BigQuery automatically modifies the schema in two key ways. First, it wraps the entire dataset inside a top-level "Root" record, even if the original table structure does not include such a wrapper. Second, it prefixes nested records with "root.", resulting in names like "root.details" instead of preserving the expected struct name, such as "details".

QUESTION: Is there a way to prevent BigQuery from automatically adding the top-level "Root" record during Avro export and from renaming nested records with a "root." prefix instead of keeping their original names? Again, Our goal is to generate an Avro file that strictly follows our predefined schema without requiring post-processing.

Upvotes: 1

Views: 73

Answers (1)

Sushma Jeerahalli
Sushma Jeerahalli

Reputation: 92

BigQuery's Avro export does automatically add a top-level "Root" record and prefixes nested records with a "root", and currently as per my understanding there is no direct way to prevent this.

If you want your requirements to be supported in BigQuery, you can feel free to raise a Feature Request mentioning your issue and the requirements.

Upvotes: 2

Related Questions