codemoped
codemoped

Reputation: 245

Support for Cloud Bigtable as Sink in Cloud Dataflow

Are there plans to enable Cloud Dataflow to write data to Cloud Bigtable? Is it even possible?

Adding a custom Sink to handle the IO would probably be the clean choice.

As a workaround, I tried connecting to a Bigtable (same project) in a simple DoFn. Opening the connection and table in the startBundle step and closing them in finishBundle.

Moreover, I added the bigtable-hbase jar (0.1.5) to the classpath and a modified version of hbase-site.xml to the resource folder which gets picked up.

When running in the cloud, I get a NPN/ALPN extensions not installed exception.

When running locally, I get an exception stating that ComputeEngineCredentials cannot find the metadata server. despite having set the GOOGLE_APPLICATION_CREDENTIALS to the generated json key file.

Any help would be greatly appreciated.

Upvotes: 1

Views: 634

Answers (2)

Solomon Duskis
Solomon Duskis

Reputation: 2711

We now have a Cloud Bigtable / Dataflow connector. You can see more at: https://cloud.google.com/bigtable/docs/dataflow-hbase

Upvotes: 4

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

Cloud BigTable requires the NPN/ALPN networking jar. This is currently not installed on the Dataflow workers. So accessing Cloud BigTable directly from a ParDo won't work.

One possible work around is to use the HBase REST API to setup a REST server to access Cloud Bigtable on a VM outside of Dataflow. These instructions might help.

You could then issue REST requests to this REST server. This could be somewhat complicated if your sending a lot of requests (i.e. processing large amounts of data and need to set up multiple instances of your REST server and load balance across them).

Upvotes: 0

Related Questions