Reputation: 893
I was trying to create a big query external table with parquet files on gcs. It's showing a wrong format error.
But using the same files to create a native table works fine. why it must be a native table.
If use a native table, how can I import more data to this table? I don't want to delete and create the table that every time I got new data.
Any help will be appreciated.
Upvotes: 0
Views: 8146
Reputation: 460
The current google documentation might be a bit tricky to understand. It is a two step process, first create definition file and use that as an input to create the table.
Creating the definition file, if you are dealing with unpartitioned folders
bq mkdef \
--source_format=PARQUET \
"<path/to/parquet/folder>/*.parquet" > "<definition/file/path>"
Otherwise, if you are dealing with hive partitioned table
bq mkdef \
--autodetect \
--source_format=PARQUET \
--hive_partitioning_mode=AUTO \
--hive_partitioning_source_uri_prefix="<path/to/hive/table/folder>" \
"<path/to/hive/table/folder>/*.parquet" > "<definition/file/path>"
Note: path/to/hive/table/folder should not include the partition folder
Eg: If your table is loaded in format gs://project-name/tablename/year=2009/part-000.parquet
bq mkdef \ --autodetect \ --source_format=PARQUET \ --hive_partitioning_mode=AUTO \ --hive_partitioning_source_uri_prefix="gs://project-name/tablename" \ "gs://project-name/tablename/*.parquet" > "def_file_name"
Finally, the table can be created from the definition file by
bq mk --external_table_definition="<definition/file/path>" "<project_id>:<dataset>.<table_name>"
Upvotes: 3
Reputation: 192
This appears to be supported now, at least in beta. This only works in us-central1 as far as I can tell.
Simply select 'External Table' and set 'Parquet' as your file type
Upvotes: 3
Reputation: 33705
Parquet is not currently a supported data format for federated tables. You can repeatedly load more data into the same table as long as you append (instead of overwriting) the current contents.
Upvotes: 1