I would like to use BigTable as a sink for a Flink job: Is there a connector out-of-the-box ? Can I use Datastream API ? How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/value are created in BigTable for nulls ? I have searched the documentation for the above topics but couldn't answer those questions. Thanks for your support !

google-cloud-platformapache-flinkgoogle-cloud-dataprocgoogle-cloud-bigtable

py-r

Reputation: 451

Flink-BigTable - Any connector?

I would like to use BigTable as a sink for a Flink job:

Is there a connector out-of-the-box ?
Can I use Datastream API ?
How can I optimally pass a sparse object (99% sparsity), i.e. ensure no key/value are created in BigTable for nulls ?

I have searched the documentation for the above topics but couldn't answer those questions.

Thanks for your support !

Upvotes: 2

Answers (2)

Bora

Reputation: 131

There is a Flink-Bigtable connector now https://github.com/google/flink-connector-gcp/tree/main/connectors/bigtable

Upvotes: 4

Igor Dvorzhak

Reputation: 4465

I do not think that Flink has a native BigTable connector.

That said, you can use Flink HBase SQL Connector with BigTable HBase client to access BigTable from Flink:

Flink job <-> Flink HBase SQL Connector <-> BigTable HBase client <-> BigTable

This connector appears to be similar as the Flink HBase connector proposed by Cloudera and that can be manually installed (see comment @rsantiago).

A possible approach regarding sparse data persistence could be taken from Cloudera's example where columns are added with put.addColumn so that in you could evaluate in that section if it is null and discard it (see comment @rsantiago).

Upvotes: 0

Flink-BigTable - Any connector?

Answers (2)

Related Questions