sr7
sr7

Reputation: 229

Whats the recommended way of copying data from hive table to Bigquery

I have seen Move data from hive tables in Google Dataproc to BigQuery Migrate hive table to Google BigQuery

But issue with distcp is, it will move data from hdfs to gs..and My tables are in ORC format. Also till now bigquery is claiming to support only JSON, CSV, AVRO.

So need help to transfer data from hive table (ORC format) to BigQuery(any format)

Upvotes: 2

Views: 4220

Answers (2)

Tutu Kumari
Tutu Kumari

Reputation: 503

orc is supported and you can easily create table from GCP console.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc

I have done it .

NOTE : The schema of the table is never mentioned in the orc files in case of hive tables and hence while uploading you will get column names as below picture . Once table created you need to rename and update the column names. enter image description here

enter image description here

Upvotes: 0

Sourygna
Sourygna

Reputation: 719

As mentioned by Elliot, ORC is not supported. So you have to convert your ORC data into one of the 3 formats you mentioned. I would personally prefer Avro because this serialization is more robust than JSON or CSV.

So the process to follow is:

  1. Create your BQ table with the correct data types (need to be done as first step, to ensure proper cast with some Avro logical types like Timestamp)
  2. Launch a Hive query to generate the data in a Avro format. See this SQL example.
  3. disctp to Google Cloud Storage
  4. "bq load" into your table
  5. Check that you haven't done any mistake by comparing that the tables on both Hive and BigQuery have the same data: https://github.com/bolcom/hive_compared_bq

Upvotes: 3

Related Questions