007chungking
007chungking

Reputation: 55

BigQuery error in mk operation: Error while reading table... "failed to add partition key < > (type: TYPE_INT64) to schema

I have successfully loaded a great deal of externally hive partitioned data (in parquets) into bigquery without issue.

All my tables are sharing the same schema and are in the same dataset. However, some tables don't work when calling the bq mk command using the external table definition files I've defined.

The full output of the error after bq mk is as follows

BigQuery error in mk operation: Error while reading table: my_example, error message: Failed to add partition key my_common_key (type: TYPE_INT64) to schema, because another column with the same name was already present. This is not allowed. Full partition schema: [my_common_key:TYPE_INT64, my_secondary_common_key:TYPE_INT64].

External table definition files look like this

{   "hivePartitioningOptions": {
     "mode": "AUTO",
     "sourceUriPrefix": "gs://common_bucket/my_example"   },  
 "parquetOptions": {
     "enableListInference": false,
     "enumAsString": false   },   "sourceFormat": "PARQUET",   "sourceUris": [
     "gs://common_bucket/my_example/*"   ]

You will see that I am relying on auto inferencing for schema with a source URI pattern, as there are numerous parquet files nested within two sub directories which hive uses as partition key. The full path is gs://common_bucket/my_example/my_common_key=2022/my_secondary_common_key=1 <- within here there are several parquets

Upvotes: 1

Views: 1771

Answers (1)

Navneet Kumar
Navneet Kumar

Reputation: 3752

Please check your data files under the bucket, does your Hive table has evolved and older files had that partition column you are using as part of data, but later the Hive tables are partitioned using "my_common_key" . I recently migrated large set of hive tables and had similar issue. Current hive table structure and underlying data has evolved over time.

One way to solve this issue will be to read that data using dataproc hive and export it back to a GCS location and then try to load to BigQuery.You can also try to use Spark SQL to do this.

Upvotes: 1

Related Questions