Using Google Dataproc to import CSV data in Bigtable

Question

I'm trying to use an instance of a Dataproc cluster to import large CSV files to HDFS, then export them to SequenceFile format, then finally to import the latest to Bigtable as described here: https://cloud.google.com/bigtable/docs/exporting-importing

I initially imported the CSV files as an external table in Hive, then exported them by inserting them in a SequenceFile backed table.

However (probably since it seems dataproc ships with Hive 1.0?), I faced the cast exception error mentioned here: Bigtable import error

I can't seem to get HBase shell or ZooKeeper up and running on the dataproc master VM, so I can't run a simple export job from CLI.

Is there an alternative way I could export bigtable-compatible sequence files from dataproc ?
What's the proper configuration to setup to get HBase and ZooKeeper running from Dataproc VM master node ?

Answers (1)